Viewpoint invariant face recognition using independent component analysis and attractor networks

Similar documents
Advanced Introduction to Machine Learning CMU-10715

Nonlinear reverse-correlation with synthesized naturalistic noise

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Independent component analysis: an introduction

Fundamentals of Computational Neuroscience 2e

Independent Component Analysis

Ângelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Spatial Representations in the Parietal Cortex May Use Basis Functions

Invariant object recognition in the visual system with error correction and temporal difference learning

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Eigenface-based facial recognition

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki

Subspace Methods for Visual Learning and Recognition

Lecture: Face Recognition

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

A Biologically-Inspired Model for Recognition of Overlapped Patterns

A recurrent model of transformation invariance by association

Neuronal Tuning: To Sharpen or Broaden?

TIME-SEQUENTIAL SELF-ORGANIZATION OF HIERARCHICAL NEURAL NETWORKS. Ronald H. Silverman Cornell University Medical College, New York, NY 10021

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Eigenfaces. Face Recognition Using Principal Components Analysis

Single Channel Signal Separation Using MAP-based Subspace Decomposition

The wake-sleep algorithm for unsupervised neural networks

FEATURE EXTRACTION USING SUPERVISED INDEPENDENT COMPONENT ANALYSIS BY MAXIMIZING CLASS DISTANCE

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Comparative Analysis of ICA Based Features

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Computational Explorations in Cognitive Neuroscience Chapter 4: Hebbian Model Learning

CHAPTER 3. Pattern Association. Neural Networks

c Springer, Reprinted with permission.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Natural Image Statistics

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Empirical Entropy Manipulation for Real-World Problems

Neural networks: Unsupervised learning

The Variance of Covariance Rules for Associative Matrix Memories and Reinforcement Learning

Effects of Interactive Function Forms in a Self-Organized Critical Model Based on Neural Networks

Learning and Memory in Neural Networks

Extraction of Sleep-Spindles from the Electroencephalogram (EEG)

THE functional role of simple and complex cells has

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

The functional organization of the visual cortex in primates

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

Modeling Surround Suppression in V1 Neurons with a Statistically-Derived Normalization Model

Face Recognition Using Multi-viewpoint Patterns for Robot Vision

University of Genova - DITEN. Smart Patrolling. video and SIgnal Processing for Telecommunications ISIP40

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Temporal Coherence, Natural Image Sequences, and the Visual Cortex

Artificial Neural Networks Examination, March 2004

Neuroscience Introduction

Blind Machine Separation Te-Won Lee

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

A Constrained EM Algorithm for Independent Component Analysis

Linear Subspace Models

An Introduction to Independent Components Analysis (ICA)

REPORT NUMIUR Molecular & Cell Biology Af U49R Drivision of Neurobiology Berkeley, CA 94720

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Skull-closed Autonomous Development: WWN-7 Dealing with Scales

Effects of Interactive Function Forms and Refractoryperiod in a Self-Organized Critical Model Based on Neural Networks

Neural Networks DWML, /25

A Cell Assembly Model of Sequential Memory

Information maximization in a network of linear neurons

Model of a Biological Neuron as a Temporal Neural Network

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Using a Hopfield Network: A Nuts and Bolts Approach

Face Recognition Using Eigenfaces

Image Analysis. PCA and Eigenfaces

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Brains and Computation

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Efficient Coding. Odelia Schwartz 2017

Optimal In-Place Self-Organization for Cortical Development: Limited Cells, Sparse Coding and Cortical Topography

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Discriminative Direction for Kernel Classifiers

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

5.0 References. Unsupervised Learning for Boltzmann Machines 15

An Introductory Course in Computational Neuroscience

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Principal Component Analysis

Statistical Analysis of fmrl Data

PCA FACE RECOGNITION

Visual motion processing and perceptual decision making

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

CIFAR Lectures: Non-Gaussian statistics and natural images

Reconnaissance d objetsd et vision artificielle

Statistical Pattern Recognition

Empirical Entropy Manipulation and Analysis

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Independent Component Analysis of Incomplete Data

Transcription:

Viewpoint invariant face recognition using independent component analysis and attractor networks Marian Stewart Bartlett University of California San Diego The Salk Institute La Jolla, CA 92037 marni@salk.edu Terrence J. Sejnowski University of California San Diego Howard Hughes Medical Institute The Salk Institute, La Jolla, CA 92037 terry@salk.edu Abstract We have explored two approaches to recogmzmg faces across changes in pose. First, we developed a representation of face images based on independent component analysis (ICA) and compared it to a principal component analysis (PCA) representation for face recognition. The ICA basis vectors for this data set were more spatially local than the PCA basis vectors and the ICA representation had greater invariance to changes in pose. Second, we present a model for the development of viewpoint invariant responses to faces from visual experience in a biological system. The temporal continuity of natural visual experience was incorporated into an attractor network model by Hebbian learning following a lowpass temporal filter on unit activities. When combined with the temporal filter, a basic Hebbian update rule became a generalization of Griniasty et al. (1993), which associates temporally proximal input patterns into basins of attraction. The system acquired representations of faces that were largely independent of pose. 1 Independent component representations of faces Important advances in face recognition have employed forms of principal component analysis, which considers only second-order moments of the input (Cottrell & Metcalfe, 1991; Turk & Pentland 1991). Independent component analysis (ICA) is a generalization of principal component analysis (PCA), which decorrelates the higher-order moments of the input (Comon, 1994). In a task such as face recognition, much of the important information is contained in the high-order statistics of the images. A representational basis in which the high-order statistics are decorrelated may be more powerful for face recognition than one in which only the second order statistics are decorrelated, as in PCA representations. We compared an ICAbased representation to a PCA-based representation for recognizing faces across changes in pose.

818 M. S. Bartlett and T. J. Sejnowski -30" -IS" 0" IS" 30" Figure 1: Examples from image set (Beymer, 1994). The image set contained 200 images of faces, consisting of 40 subjects at each of five poses (Figure 1). The images were converted to vectors and comprised the rows of a 200 x 3600 data matrix, X. We consider the face images in X to be a linear mixture of an unknown set of statistically independent source images S, where A is an unknown mixing matrix (Figure 2). The sources are recovered by a matrix of learned filters, W, which produce statistically independent outputs, U. s X U ~ X ~ ~ X ~ III Sources Unknown Face Learned Separated Mixing Images Weights Outputs Process Figure 2: Image synthesis model. The weight matrix, W, was found through an unsupervised learning algorithm that maximizes the mutual information between the input and the output of a nonlinear transformation (Bell & Sejnowski, 1995). This algorithm has proven successful for separating randomly mixed auditory signals (the cocktail party problem), and has recently been applied to EEG signals (Makeig et al., 1996) and natural scenes (see Bell & Sejnowski, this volume). The independent component images contained in the rows of U are shown in Figure 3. In contrast to the principal components, all 200 independent components were spatially local. We took as our face representation the rows of the matrix A = W- 1 which provide the linear combination of source images in U that comprise each face image in X. 1.1 Face Recognition Performance: lea vs. Eigenfaces We compared the performance of the lea representation to that of the pea representation for recognizing faces across changes in pose. The pea representation of a face consisted of its component coefficients, which was equivalent to the "Eigenface"

Viewpoint Invariant Face Recognition 819 Figure 3: Top: Four independent components of the image set. Bottom: First four principal components. representation (Turk & Pentland, 1991). A test image was recognized by assigning it the label of the nearest of the other 199 images in Euclidean distance. Classification error rates for the ICA and PCA representations and for the original graylevel images are presented in Table 1. For the PCA representation, the best performance was obtained with the 120 principal components corresponding to the highest eigenvalues. Dropping the first three principal components, or selecting ranges of intermediate components did not improve performance. The independent component sources were ordered by the magnitude of the weight vector, row of W, used to extract the source from the image. 1 Best performance was obtained with the 130 independent components with the largest weight vectors. Performance with the ICA representation was significantly superior to Eigenfaces by a paired t-test (0 < 0.05). Graylevel Images PCA ICA Mutual Information.89.10.007 Percent Correct Recognition.83.84.87 Table 1: Mean mutual information between all pairs of 10 basis images, and between the original graylevel images. Face recognition performance is across all 200 images. For the task of recognizing faces across pose, a statistically independent basis set provided a more powerful representation for face images than a principal component representation in which only the second order statistics are decorrelated. IThe magnitude of the weight vector for optimally projecting the source onto the sloping part of the nonlinearity provides a measure of the variance of the original source (Tony Bell, personal communication).

820 M. S. Bartlett and T. 1. Sejnowski 2 Unsupervised Learning of Viewpoint Invariant Representations of Faces in an Attractor Network Cells in the primate inferior temporal lobe have been reported that respond selectively to faces despite substantial changes in viewpoint (Hasselmo, Rolls, Baylis, & Nalwa, 1989). Some cells responded independently of viewing angle, whereas other cells gave intermediate responses between a viewer-centered and an object centered representation. This section addresses how a system can acquire such invariance to viewpoint from visual experience. During natural visual experience, different views of an object or face tend to appear in close temporal proximity as an animal manipulates the object or navigates around it, or as a face changes pose. Capturing such temporal relationships in the input is a way to automatically associate different views of an object without requiring three dimensional descriptions. Attractor Network CoDlpetitive Hebbian Learning Figure 4: Model architecture. Hebbian learning can capture these temporal relationships in a feedforward system when the output unit activities are passed through a lowpass temporal filter (Foldiak, 1991; Wallis & Rolls, 1996). Such lowpass temporal filters have been related to the time course of the modifiable state of a neuron based on the open time of the NMDA channel for calcium influx (Rhodes, 1992). We show that 1) this lowpass temporal filter increases viewpoint invariance of face representations in a feedforward system trained with competitive Hebbian learning, and 2) when the input patterns to an attractor network are passed through a lowpass temporal filter, then a basic Hebbian weight update rule associates sequentially proximal input patterns into the same basin of attraction. This simulation used a subset of 100 images from Section 1, consisting of twenty faces at five poses each. Images were presented to the model in sequential order as the subject changed pose from left to right (Figure 4). The first layer is an energy model related to the output of V1 complex cells (Heeger, 1991). The images were filtered by a set of sine and cosine Gabor filters at 4 spatial scales and 4 orientations at 255 spatial locations. Sine and cosine outputs were squared and summed. The set of V1 model outputs projected to a second layer of 70 units, grouped into two

Viewpoint Invariant Face Recognition 821 inhibitory pools. The third stage of the model was an attractor network produced by lateral interconnections among all of the complex pattern units. The feedforward and lateral connections were trained successively. 2.1 Competitive Hebbian learning of temporal relationships The Competitive Learning Algorithm (Rumelhart & Zipser, 1985) was extended to include a temporal lowpass filter on output unit activities (Bartlett & Sejnowski, 1996). This manipulation gives the winner in the previous time steps a competitive advantage for winning, and therefore learning, in the current time step. winner = maxj [y/t)] y/t) = AY} + (1 _ A)y/t-1) (1) The output activity of unit j at time t, y/t), is determined by the trace, or running average, of its activation, where y} is the weighted sum of the feedforward inputs, a is the learning rate, Xitl is the value of input unit i for pattern u, and 8t1 is the total amount of input activation for pattern u. The weight to each unit was constrained to sum to one. This algorithm was used to train the feedforward connections. There was one face pattern per time step and A was set to 1 between individuals. 2.2 Lateral connections in the output layer form an attractor network Hebbian learning of lateral interconnections, in combination with a lowpass temporal filter on the unit activities in (1), produces a learning rule that associates temporally proximal inputs into basins of attraction. We begin with a simple Hebbian learning rule P (2) Wij = ~ L(Y~ - yo)(y} _ yo) t=1 where N is the number of units, P is the number of patterns, and yo is the mean activity over all of the units. Replacing y~ with the activity trace y/t) defined in (1), substituting yo = Ayo + (1 - A)YO and multiplying out the terms leads to the following learning rule: +k2 [(y/t-1) _ yo)(y/t-1) _ yo)] (3) h k >'(1;>') d k (1_;)2 were 1= >. an 2= >. The first term in this equation is basic Hebbian learning, the second term associates pattern t with pattern t - 1, and the third term is Hebbian association of the trace activity for pattern t - 1. This learning rule is a generalization of an attractor network learning rule that has been shown to associate random input patterns

822 M. S. Bartlett and T. 1. Sejnowski into basins of attraction based on serial position in the input sequence (Griniasty, Tsodyks & Amit, 1993). The following update rule was used for the activation V of unit i at time t from the lateral inputs (Griniasty, Tsodyks, & Amit, 1993): Vi(t + lst) = [I: Wij Vj(t) - 0] Where 0 is a neural threshold and (x) = 1 for x > 0, and 0 otherwise. In these simulations, 0 = 0.007, N = 70, P = 100, yo = 0.03, and A = 0.5 gave kl = k2 = 1. 2.3 Results Temporal association in the feedforward connections broadened the pose tuning of the output units (Figure 5 Left). When the lateral connections in the output layer were added, the attractor network acquired responses that were largely invariant to pose (Figure 5 Right). 80.8 :c,.!g ~ 0.6... <3 ~0. 4 Q) ~0. 2 o,...-o_-u, [J Same Face, with Trace o Same Face, no Trace -- Different Faces Hebb plus trace DTesiset.. Griniasly 01. al. Hebbonly -600-450 _30 _15 0 15 30 45 60 ~Pose -600 _45 _30 _15 0 15 30 45 60 ~Pose Figure 5: Left: Correlation of the outputs of the feedforward system as a function of change in pose. Correlations across different views of the same face (- ) are compared to correlations across different faces (--) with the temporal trace parameter A = 0.5 and A = O. Right: Correlations in sustained activity patterns in the attractor network as a function of change in pose. Results obtained with Equation 3 (Hebb plus trace) are compared to Hebb only, and to the learning rule in Griniasty et al. (1993). Test set results for Equation 3 (open squares) were obtained by alternately training on four poses and testing on the fifth, and then averaging across all test cases. Attractor Network lea F N FIN % Correct % Correct 5 70.07 1.00.96 10 70 _14.90.86 20 70.29.61.89 20 160.13.73.89 Table 2: Face classification performance of the attractor network for four ratios of the number of desired memories, F, to the number of units, N. Results are compared to ICA for the same subset of faces.

Viewpoint Invariant Face Recognition 823 Classification accuracy of the attractor network was calculated by nearest neighbor on the activity states (Table 2). Performance of the attractor network depends both on the performance of the feedforward system, which comprises its input, and on the ratio of the number of patterns to be encoded in memory, F, to the number of units, N, where each individual in the face set comprises one memory pattern. The attractor network performed well when this ratio was sufficiently high. The ICA representation also performed well, especially for N=20. The goal of this simulation was to begin with structured inputs similar to the responses of VI complex cells, and to explore the performance of unsupervised learning mechanisms that can transform these inputs into pose invariant responses. We showed that a lowpass temporal filter on unit activities, which has been related to the time course of the modifiable state of a neuron (Rhodes, 1992), cooperates with Hebbian learning to (1) increase the viewpoint invariance of responses to faces in a feedforward system, and (2) create basins of attraction in an attractor network which associate temporally proximal inputs. These simulations demonstrated that viewpoint invariant representations of complex objects such as faces can be developed from visual experience by accessing the temporal structure of the input. Acknowledgments This project was supported by Lawrence Livermore National Laboratory ISCR Agreement B291528, and by the McDonnell-Pew Center for Cognitive Neuroscience at San Diego. References Bartlett, M. Stewart, & Sejnowski, T., 1996. Unsupervised learning of invariant representations of faces through temporal association. Computational Neuroscience: Int. Rev. Neurobio. Suppl. 1 J.M Bower, Ed., Academic Press, San Diego, CA:317-322. Beymer, D. 1994. Face recognition under varying pose. In Proceedings of the 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Comput. Soc. Press: 756-61. Bell, A. & Sejnowski, T., (1997). The independent components of natural scenes are edge filters. Advances in Neural Information Processing Systems 9. Bell, A., & Sejnowski, T., 1995. An information Maximization approach to blind separation and blind deconvolution. Neural Compo 7: 1129-1159. Comon, P. 1994. Independent component analysis - a new concept? Signal Processing 36:287-314. Cottrell & Metcalfe, 1991. Face, gender and emotion recognition using Holons. In Advances in Neural Information Processing Systems 3, D. Touretzky, (Ed.), Morgan Kaufman, San Mateo, CA: 564-571. Foldiak, P. 1991. Learning invariance from transformation sequences. Neural Compo 3:194-200. Griniasty, M., Tsodyks, M., & Amit, D. 1993. Conversion of temporal correlations between stimuli to spatial correlations between attractors. Neural Compo 5:1-17. Hasselmo M. Rolls E. Baylis G. & Nalwa V. 1989. Object-centered encoding by faceselective neurons in the cortex in the superior temporal sulcus of the monkey. Experimental Brain Research 75(2):417-29. Heeger, D. (1991). Nonlinear model of neural responses in cat visual cortex. Computational Models of Visual Processing, M. Landy & J. Movshon, Eds. MIT Press, Cambridge, MA. Makeig, S, Bell, AJ, Jung, T-P, and Sejnowski, TJ 1996. Independent component analysis of Electroencephalographic data, In: Advances in Neural Information Processing Systems 8, 145-151. Rhodes, P. 1992. The long open time of the NMDA channel facilitates the self-organization of invariant object responses in cortex. Soc. Neurosci. Abst. 18:740. Rumelhart, D. & Zipser, D. 1985. Feature discovery by competitive learning. Cognitive Science 9: 75-112. Turk, M., & Pentland, A. 1991. Eigenfaces for Recognition. J. Cog. Neurosci. 3(1):71-86. Wallis, G. & Rolls, E. 1996. A model of invariant object recognition in the visual system. Technical Report, Oxford University Department of Experimental Psychology.