Leonardo R. Bachega*, Student Member, IEEE, Srikanth Hariharan, Student Member, IEEE Charles A. Bouman, Fellow, IEEE, and Ness Shroff, Fellow, IEEE

Size: px

Start display at page:

Download "Leonardo R. Bachega*, Student Member, IEEE, Srikanth Hariharan, Student Member, IEEE Charles A. Bouman, Fellow, IEEE, and Ness Shroff, Fellow, IEEE"

Hugh Reeves
5 years ago
Views:

Distributed Signal Decorrelation and Detection in Sensor Networks Using the Vector Sparse Matrix Transfor Leonardo R Bachega*, Student Meber, I, Srikanth Hariharan, Student Meber, I Charles A Bouan,

1 Distributed Signal Decorrelation and Detection in Sensor Networks Using the Vector Sparse Matrix Transfor Leonardo R Bachega*, Student Meber, I, Srikanth Hariharan, Student Meber, I Charles A Bouan, Fellow, I, and Ness Shroff, Fellow, I Abstract In this paper, we propose the vector SMT, a new decorrelating transfor suitable for perforing distributed processing of high diensional signals in sensor networks We assue that each sensor in the network encodes its easureents into vector outputs instead of scalar ones The proposed transfor decorrelates a sequence of pairs of vector sensor outputs, until these vectors are decorrelated In our experients, we siulate distributed anoaly detection by a network of caeras onitoring a spatial region ach caera records an iage of the onitored environent fro its particular viewpoint and outputs a vector encoding the iage Our results, with both artificial and real data, show that the proposed vector SMT transfor effectively decorrelates iage easureents fro the ultiple caeras in the network while aintaining low overall counication energy consuption Since it enables joint processing of the ultiple vector outputs, our ethod provides significant iproveents to anoaly detection accuracy when copared to the baseline case when the iages are processed independently Index Ters Sparse Matrix Transfor, Wireless Sensor Networks, Sart Caera Networks, Distributed Signal Processing, Distributed Anoaly Detection, Multiview Iage Processing, Pattern Recognition I INTRODUCTION In recent years, there has been significant interest in the use of sensor networks for distributed onitoring in any applications [], [] In particular, networks with caera sensors have gained significant popularity [3], [4] Consider the scenario where all caeras collectively onitor the sae environent ach caera registers an iage of the environent fro its specific viewpoint and encodes it into a vector output As the nuber of deployed caeras grows, so does the cobined data generated fro all caeras Since these caeras usually operate under liited battery power and narrow counication bandwidth, this data deluge created in large networks iposes serious challenges to the way data is counicated and processed vent detection and ore specifically anoaly detection are iportant applications for any sensor networks [5] LR Bachega and CA Bouan are with the School of lectrical and Coputer ngineering, Purdue University, 465 Northwestern Ave, West Lafayette, IN , USA Tel: , Fax: , -ail: {lbachega, bouan}@purdueedu S Hariharan is with AT&T Labs, San Raon, CA 9453, USA ail: srikanthhariharan@gailco N B Shroff is with the Departents of C and CS at the Ohio State University, Colubus, OH 43, USA ail: shroff@eceosuedu This aterial is based upon work supported by, or in part by, the US Ary Research Laboratory and the US Ary Research Office under contract/grant nuber 5654-CI, and by grant fro the Ary Research Office MURI Award W9NF (SA-3) x Fig : A caera network where each caera captures an iage of the environent fro one viewpoint and encodes the iage into a vector output The aggregated outputs fro all caeras for the high-diensional vector, x Caeras i and j have overlapping views Since outputs fro caeras with overlapping views tend to be correlated, so does the aggregated vector x In general, the vector outputs fro all sensors in a network can be concatenated to for a single p-diensional vector x, and then the goal of anoaly detection is to deterine if x corresponds to a typical or anoalous event Fig illustrates this scenario for a network of caeras The vector outputs fro different caeras in the network are likely to be correlated, particularly when the caeras capture overlapping portions of the scene; so for best detection accuracy, vector x should be decorrelated as part of the detection process One possible approach to decorrelate x is to have all caeras send their vector outputs to a single sink node This approach has several probles because it puts a disproportional and unscalable burden on the sink and on the counication links leading to it One possible solution is to design a ore powerful sink node Unfortunately, having a powerful sink node is not a suitable solution for the any applications that require nodes to operate in an ad hoc anner [6], [7], re-arranging theselves dynaically Alternatively, each sensor can copute the likelihood of its vector easureent independently and send a single (scalar) likelihood value to the sink, which then cobines the likelihoods coputed by each sensors and akes a detection decision While requiring inial counication energy, this approach does not odel correlations between caera outputs, potentially leading to poor detection accuracy Because of the liitations above, there is a need for distributed algoriths which can decorrelate vector caera

2 outputs without use of a centralized sink, while keeping the counication aong sensors low Several ethods to copute distributed Karhunen-Loéve transfor (KLT) and principal coponents analysis (PCA) in sensor networks have been proposed Distributed PCA algoriths are proposed in [] and [9] Both ethods operate on scalar sensor outputs, and in order to constrain counication in the network, they assue that sensor outputs are conditionally independent given the outputs of neighboring sensors A distributed KLT algorith is proposed in [], [], [], [3] to copress/encode vector sensor outputs with the subsequent goal of reconstructing the aggregated output at the sink node with iniu ean-square error Distributed decorrelation using a wavelet transfor with lifting has been studied for sensor networks with a linear topology [4], two-diensional networks [5], and networks with tree topology [6] While assuing specific network topologies and correlation odels for scalar sensor outputs, these ethods focus ainly on efficient data gathering and routing when sensor easureents are correlated Also, these ethods do not take into consideration that sensors far apart in the network can generate highly correlated outputs, as in the case when two caeras pointing to the sae event, and therefore producing correlated outputs, can be several hops apart fro each other, as argued in [7] Multiple efforts have been ade in distributed detection since the early 9s (see [] for a survey) Most approaches rely on encoding scalar sensor outputs efficiently to cope with low counication bandwidth and transitting encoded outputs to a fusion center in charge of aking final detection decisions More recently, detection of volue anoalies in networks have been studied in [9], [], [] These approaches focus on scalar easureents in network links and rely on centralized data processing for anoaly detection Several ethods for video anoaly detection have been proposed (see [] for a survey) The ethod in [] uses ulti-view iages of a highway syste to detect traffic anoalies, with each view onitoring a different road segent or intersection The processing of the ultiple views is non-distributed and the ethod does not odel any correlations between views Accurate anoaly detection requires decorrelation of the background signal [3] In order to decorrelate the background, we need an accurate estiate of its covariance atrix Several ethods to estiate covariances of highdiensional signals have been proposed recently [4], [5], [6], [7], [], [9] Aong these ethods, the Sparse Matrix Transfor (SMT) [], here referred to as the scalar SMT, has been shown to be effective, providing fullrank covariance estiates of high-diensional signals even when the nuber n of training saples used to copute the estiates is uch saller than the diension p of a data saple, ie, n p Furtherore, the decorrelating transfor designed by the SMT algorith consists of a product of O(p) Givens rotations, and therefore, it is coputationally inexpensive to apply The scalar SMT has been used in detection and classification of high-diensional signals [3], [3], [3] and Givens rotations have been used in ICA [33] Since it involves only pairwise operations between coordinate pairs, it is well-suited to distributed decorrelation [34] However, this existing ethod is only well suited for decorrelation of scalar sensor outputs In this paper, we propose the vector sparse atrix transfor (vector SMT), a novel algorith suited for distributed signal decorrelation in sensor networks where each sensor outputs a vector It generalizes the concept of the scalar sparse atrix transfor in [] to decorrelation of vectors This novel algorith operates on pairs of sensor outputs, and it has the interpretation of axiizing the constrained log likelihood of x In particular, the vector SMT decorrelating transfor is defined as an orthonoral transforation constrained to be fored by a product of pairwise transfors between pairs of vector sensor outputs We design this transfor using a greedy optiization of the likelihood function of x Once this transfor is designed, the associated pairwise transfors are applied to sensor outputs distributed over the network, without the need of a powerful central sink node The total nuber of pairwise transfors is a odel order paraeter By constraining the value of this odel order paraeter to be sall, our ethod iposes a sparsity constraint to the data When this sparsity constraint holds for the data being processed, the vector SMT can substantially iprove the accuracy of the resulting decorrelating transfor even when a liited nuber of training saples is available Being able to perfor distributed decorrelation while consuing liited counication energy is an iportant characteristic of our ethod Our priary way of liiting energy consuption is to select the odel order paraeter value such that the total energy required for distributed decorrelation is less than a specified budget Another approach to liit energy consuption is based on constrained likelihood optiization using Lagrange ultipliers Since sensor pairs that are far apart can be highly correlated, the unconstrained greedy optiization of the likelihood of x ay result in pairwise transfors between sensors that are far apart, requiring prohibitive aounts of counication energy To liit energy consuption in such a scenario, we constrain the greedy optiization of the likelihood function by adding to it a linear penalization ter that odels the energy required by the associated decorrelating transfor As a result, during the design of the decorrelating transforation, our ethod selects sensor pairs based on the correlation between their outputs while penalizing the ones that are several hops apart and require high energy consuption for their pairwise transfors We introduce the new concept of a correlation score, a easure of correlation between two vectors This correlation score generalizes the concept of correlation coefficient to pairs of rando vectors In fact, we show that the correlation score between two scalar rando variables is the absolute value of their correlation coefficient We use this correlation score to select pairs of ost correlated sensor outputs during the design of the vector SMT decorrelating transfor, as part of the greedy optiization of the likelihood of x We reark that this concept is closely related to the concepts of utual inforation between two rando vectors [35], and their total correlation [36] To validate our ethod, we describe experients using siulated data, artificially generated ulti-caera iage data of 3D spheres, and real ulti-caera data of a courtyard We use the vector SMT to decorrelate the data fro

3 ultiple caeras in a siulated network for the purpose of anoaly detection We copare our ethod against centralized and independent approaches for processing the sensor outputs The centralized approach relies on a sink node to decorrelate all sensor outputs and requires a large aount of energy to counicate all sensor data The independent approach relies on each sensor to copute the partial likelihood of its output independently fro the others and counicate the resulting value to the sink that akes the final detection decision While iniizing counication energy, this independent approach leads to poor detection accuracy since it does not take into account correlations between sensor outputs Our results show that the vector SMT decorrelation enables consistently ore accurate anoaly detection across the experients while keeping the counication energy required for distributed decorrelation low The rest of this paper is organized as follows Sec II describes the ain concepts of the scalar SMT Sec III introduces the vector SMT algorith, designed to perfor distributed decorrelation of vector sensor outputs in a sensor network Sec IV shows how to use the vector SMT to enable distributed detection in a sensor network Sec V shows experiental results of detection using data fro ulti-caera views of objects as well as siulated data Finally, the ain conclusions and future work are discussed in Sec VI II TH SCALAR SPARS MATRIX TRANSFORM Let x be a p-diensional rando vector fro a ultivariate, Gaussian distribution, N(, R) Moreover, the covariance atrix, R can be decoposed into R = Λ t, where Λ is a diagonal atrix and is orthonoral The Sparse Matrix Transfor (SMT) [] odels the orthonoral atrix as the product of K sparse atrices, K, so that K = k = K () k= In (), each sparse atrix k, known as a Givens rotation, is a planar rotation over a coordinate pair (i k,j k ) paraetrized by an angle θ k, ie, k = I +Θ(i k,j k,θ k ), () where cos(θ k ) if i = j = i k or i = j = j k sin(θ [Θ] ij = k ) if i = i k and j = j k (3) sin(θ k ) if i = j k and j = i k otherwise This SMT odel assues that K Givens rotations in () are sufficient to decorrelate the vector x ach atrix, k operates on a single coordinate pair of x, playing a role analogous to the decorrelating butterfly in the fast Fourier Transfor (FFT) Since both the ordering of coordinate pairs (i k,j k ), and the values of rotation angles θ k are unconstrained, the SMT can odel a uch larger class of signal covariances than the FFT In fact, the scalar SMT is a generalization of both the FFT and the orthonoral wavelet transfor Figs and (c) ake a visual coparison of the FFT and the Scalar SMT The SMT rotations can operate on pairs of coordinates in any order, while in the FFT, the butterflies are constrained to a well-defined sequence with specific rotation angles The scalar SMT design consists in learning the product in () fro a set of n independent and identically distributed training vectors, X = [x,,x n ], fro N(,R) Assuing that R = Λ t, the axiu likelihood estiates of and Λ are given by { Ê=arg in diag( t S) } (4) Ω K ˆΛ=diag(Êt SÊ), (5) where S = n XXt, and Ω K is the set of allowed orthonoral transfors The functions diag( ) and are the diagonal and deterinant, respectively, of a atrix arguent With the SMT odel assuption, the orthonoral transfors in Ω K are in the for of (), and the total nuber of planar rotations,k is the odel order paraeter When perforing an unconstrained iniization of (4) by allowing the set Ω K to contain all orthonoral transfors, when n > p, the iniizer,ê is the orthonoral atrix that diagonalizes the saple covariance, ie, ÊˆΛÊt = S However, S is a poor estiate of R when n < p As shown in [], the greedy optiization of (4) under the constraint that the allowed transfors are in the for of () yields accurate estiates even when n p The constraint in () is non-convex with no obvious closed for solution In [], we use a greedy optiization approach in which we select each Givens rotation, k, independently, in sequence to iniize the cost in (4) The odel order paraeter K can be estiated using crossvalidation over the training set [37], [3] or using the iniu description length (MDL) [3] Typically, the average nuber of rotations per coordinate, K/p is sall (< 5), so that the coputation to apply the SMT to a vector of data is very low, ie, (K/p)+ floating-point operations per coordinate Finally, when K = ( p ), the SMT factorization of R is equal to its exact diagonalization, a process known as Givens QR III DISTRIBUTD DCORRLATION WITH TH VCTOR SPARS MATRIX TRANSFORM Our goal is to decorrelate the p-diensional vector x aggregated fro outputs of all sensors, where each of the L sensors outputs an h-diensional sub-vector of x The vector SMT operates on x by decorrelating a sequence of pairs of its sub-vectors This vector SMT generalizes the concept of the scalar SMT in Sec II to the decorrelation of pairs of vectors instead of pairs of coordinates A The Vector SMT Model Let the p-diensional vector x be partitioned into L subvectors, x = x () x (L), where each sub-vector, x (i) is an h-diensional vector output fro a sensor i =,,L in a sensor network A 3

4 vector SMT is an orthonoral p p transfor, T, written as the product of M orthonoral, sparse atrices, M T = T, (6) = where each pairwise transfor, T R p p, is a blockwise sparse, orthonoral atrix that operates exclusively on the h-diensional subspace of the sub-vector pair x (i), x (j), as illustrated in Fig The decorrelating transfor is then fored by the product of the M pairwise transfors, where M is a odel order paraeter ach T is a generalization of a Givens rotation in () to a transfor that operates on pairs of sub-vectors instead of coordinates Siilarly, the vector SMT in (6) generalizes the concept of the scalar SMT in Sec II: it decorrelates a high-diensional vector by decorrelating its pairs of subvectors instead of pairs of coordinates Figs and (d) copare the vector and the scalar SMTs approaches graphically In the scalar SMT, each Givens rotation k plays the role of a decorrelating butterfly (Fig ) that together decorrelate x In the vector SMT, each orthonoral atrix T corresponds to series of decorrelating butterflies that operate exclusively on coordinates of a single pair of subvectors of x Finally, the sequence in (6), illustrated in Fig (d), decorrelates M pairs of sub-vectors of x, until the decorrelated vector x is obtained In a sensor network, we copute the distributed decorrelation of x by distributing the application of transfors T fro the product (6) across ultiple sensors Before the decorrelation, each sub-vector x (i) of x is the output of a sensor i and is stored locally in that sensor Applying each T to sub-vectors x (i), x (j) requires point-to-point counication of one h-diensional sub-vector between sensors i and j, consuing an aount of energy, (h,i,j ), proportional to soe easure of the distance between these sensors After applying T, the resulting decorrelated sub-vectors x (i) and x (j) are cached at the sensor used to copute this pairwise decorrelation, avoiding counicating one sub-vector back to its originating sensor Finally, the total counication energy required for the entire decorrelation is given by M (h,i,,i M,j,,j M ) = (h,i,j ) (7) B The Design of the Vector SMT = We design the vector SMT decorrelating transfor fro training data, using the axiu likelihood estiation of the data covariance atrix Let X = [x,,x n ] R p n, be a p n atrix where each colun, x i is a p-diensional zero ean Gaussian rando vector with covariance R In general, a covariance can be decoposed as R = TΛT t, where Λ is the diagonal eigenvalue atrix and T is an orthonoral atrix In this case, the log likelihood of X given T and Λ is given by logp (T,Λ) (X) = n tr[diag(tt ST)Λ ] n log(π)p Λ, () where S = n XXt When constraining T to be of the product for of (6), the joint axiu likelihood estiates Λ and T are given by { ˆT =arg in diag(t T= t ST) } (9) M = T ˆΛ=diag(ˆT t S ˆT) () Since the iniization in (9) has a non-convex constraint, its global iniizer is difficult to find Therefore, we use a greedy procedure that designs each new T, =,,M, independently while keeping the others fixed We start by setting S = S and X = X, and iterate over the following steps: { ˆT =arg in diag(t t T Ω S T ) } () S + = ˆT t S ˆT () X + = ˆT t X, (3) where Ω is the set of all allowed pairwise transfors Since T operates exclusively on x (i) and x (j), once the pair (i,j ) is selected, the design of T involves only the coponents of X associated with these sub-vectors Let X (i) and X (j) be h n sub-atrices of X associated with the sub-vector pair (i,j ) Their associated h h saple covariance is then given by [ ] S (i,j) = X (i) [ n X (j) X (i)t X (j)t ] (4) The iniization in () for a fixed subvector pair (i,j ) can be recast in ters ofs (i,j), and theh h orthonoral atrix, { } = arg in diag( t S (i,j) ), (5) Ω h h where Ω h h is the set of all valid h h orthonoral transfors In practice, the optiization of is precisely the sae proble as the scalar SMT design presented in Sec II Once is selected, we partition it into four h h blocks, = [ (,) (,) (,) (,) and then we obtain the transfor T using Kronecker product as T =J (i,i) (,) +J (i,j) (,) +J (j,i) (,) +J (j,j) (,), (6) +I p p (J (i,i) +J (j,j) ) I h h where J (i,j) is a L L atrix given by [ J (i,j)] i j = { if i = i and j = j otherwise ], (7) Fig 3 illustrates the relationship between the h h orthonoral transfor, and the block sparse, p p orthonoral transfor T The four blocks of are inserted in the appropriate block locations to for the larger, block sparse atrix T The overall change in 4

5 x (,)t T t (,)t x i x () x () (,)t (,)t j x () x () x x x 3 x 4 x 5 x i K x x x 3 x 4 x 5 x 6 j x W x - x 3 x W 4 - x 5 W x 6 - x 7 W x - W W W - - W W - W - W 3 x x x 3 x 4 - x 5 - x 6 - x - 7 x x 7 x 7 W x 3 T x (d) (c) Fig : In the product x = T t x, the p p block-wise sparse transfor T operates over the p-diensional vector x, changing only the h coponents associated with the h-diensional sub-vectors x (i),x (j) (shaded) scalar SMT decorrelation, x = t x ach k plays the role of a decorrelating butterfly, operating on a single pair of coordinates (c) -point FFT, seen as a particular case of the scalar SMT where the butterflies are constrained in their ordering and rotation angles (d) Vector SMT decorrelation, x = T t x, with each T decorrelating a sub-vector pair of x instead of a single coordinate pair T is an instance of the scalar SMT with decorrelating butterflies operating only on coordinates of a single pair of sub-vectors x (L-) x (L) T T M x (L-) x (L) the log likelihood in () due to applying T to X and axiized with respect to ˆΛ(T ) is given by (see App A) logp (X (T,ˆΛ(T )) )= n log diag(tt S T ) diag(s ) = n log diag(t S (i,j) ) diag(s (i,j) ) = n log( F i j ), () where we introduce the concept of a correlation score, F i,j, defined by F i,j = diag( t S (i,j) ) diag(s (i,j) ) In App B, we show that the correlation score generalizes the concept of the correlation coefficient to pairs of rando vectors and derive its ain properties The pair of subvectors with the largest value of F ij produces the largest increase in the log likelihood in () Therefore, we use the axiu value of F ij as the criterion for selecting the pair (i,j ) during the design of ˆT in () Finally, the algorith in Fig 3 suarizes this greedy procedure to design the vector SMT C The Vector SMT Design with Counication nergy Constraints We extend the vector SMT design in Sec III-B to account for the counication energy required for distributed decorrelation in a sensor network When each T operates on x (i) and x (j) in a sensor network, it requires an aount, (h,i,j ) of energy for counication In a scenario with a constrained energy budget, selecting sensors i and j based on the largest F ij can be prohibitive if these sensors are several hops apart in the network We augent the likelihood in () with a linear penalization ter associated with the total counication energy required for distributed decorrelation The augented log likelihood is given by M L (T,Λ) (X) = logp (T,Λ) (X) µ (h,i,j ) (9) = The paraeter µ has units of log likelihood/energy, and controls the weight given to the counication energy when axiizing the likelihood When µ =, the design becoes the unconstrained vector SMT design in Sec III-B When we apply T to X and axiize (9) with respect to ˆΛ(T ), the overall change in the augented likelihood is given by L (X (T,ˆΛ(T )) )=L (X (T,ˆΛ(T )) ) L (I,ˆΛ(I)) (X ) = n { } diag(t t log S T ) diag(s ) µ(h,i,j ) () = n log( F i j ) µ(h,i,j ) Therefore, when designing ˆT with energy constraints, we select the pair of sub-vectors (i,j ) with the sallest value of( F i,j )e µ(h,i,j)/n, ie, the pair(i,j ) that siultaneously axiizes the correlation coefficient, F ij and iniizes the counication energy penalty, µ(h,i,j ) in order to increase the augented log likelihood in () by the largest aount D Model Order Identification Let M M be a vector SMT odel with decorrelating transfor T = M = T Here, we discuss three alternatives for selecting the odel order paraeter, M ) Fixed Maxiu nergy: We select M such that the total energy required for the distributed decorrelation, T t x does not exceed soe fixed threshold, ie, M = (h,i,j ) This threshold, is fixed based on a pre-established axiu energy budget allowed for the distributed decorrelation 5

6 (,) (,) (,) (,) (,) (,) (,) (,) T i j i j //Initialization forall the i L [ and j L do S (i,j) n X (i) ] [ X (j) X (i)t X (j)t] CoputeScalarSMT(S (i,j) ) ( F ij diag(t S (i,j) ) ) diag(s (i,j) ) end //Main Loop for =,,M do (i,j) arg axf ij CoputeScalarSMT(S (i,j) ) T MapToPairwiseTransfor(, i, j) Update atrix F ij S (i,j) t S(i,j) end Fig 3: Mapping fro the h h orthonoral atrix, to the p p block-wise sparse atrix T associated with the (i,j ) sub-vector pair The vector SMT design algorith ) Cross-Validation: We partition the p n data saple atrix X into K, p n k atrices X (k), X = [X () X (K) ], and define X (k) as a atrix containing the saples in X that are not in X (k) For each k =,,K, we design M M fro X (k), and copute its log likelihood over X k, ie, logp MM (X (k) X (k) ) We select M so that it axiizes the average cross-validated log likelihood [39], L(M M ) = K logp MM (X (k) K X (k) ) () i= 3) Miniu Description Length (MDL) Criterion: Based on the MDL principle [4], [4], [4], we select M such that the odel M M has the shortest encoding, aong all odels, of both its paraeters and the saple atrix, X The total description length of M M in nats is given by l M = logp MM (X)+ MKlog(pn)+MKlog(h) +M log(l), () where logp MM (X) nats are used to encode X, MKlog(pn) nats are used to encode themk real-valued angles of the Givens rotations across all M pairwise transfors, MKlog(h) nats are used for the MK rotation coordinate pairs, and finally, M log(l) nats are used for the indices of sub-vector pairs of the M pairwise transfors Our goal is then to select M such that it iniizes l M in () Initially, l M decreases with M because it is doinated by the likelihood ter, logp MM (X) However, when M is large, the other ters doinate l M causing it to increase as M increases Therefore, we select M that iniizes l M by picking the first value of M such that l M+ l M = log p M M+ (X) p MM (X) + Klog(pn) +Klog(h)+log(L) = n log( F i,j )+ Klog(pn) +Klog(h)+log(L) This condition leads to this new stop condition for the ain loop of the algorith in Fig 3, Fi K log(pn)+4k log(h)+4log(l),j e n (3) It is easy to generalize l M in () to the case where each pairwise transfor, T has a different nuber of Givens rotations, K, resulting in l (general) M = logp MM (X)+ + M K log(pn) = M K log(h)+m log(l) (4) = Finally, when l (general) M+ l (general) M is satisfied, the new stop condition for the loop in Fig 3 is given by Fi,j e K + log(pn)+4k + log(h)+4log(l) n (5) IV ANOMALY DTCTION We use the vector SMT to copute the covariance estiate, ˆR of the p-diensional vector, x for the purpose of perforing anoaly detection using the Neyan- Pearson fraework [3] Here, we first forulate the anoaly detection proble, and then describe the ellipsoid volue easure of detection accuracy [43] used in the experiental section A Proble Forulation Let the p-diensional vector x be an aggregated easureent fro all L sensors in the network We presue that x is typical (non-anoalous) if it is sapled fro a ultivariate Gaussian distribution, N(, R) or anoalous if it is sapled fro a unifor distribution U(x) = c, for soe constantc[44], [45] Forally, we have the following hypotheses, H :x N(,R) H :x U, (6) where H and H are the null and alternative hypotheses respectively According to the Neyan-Pearson lea [3], the optial classifier has the for of the log likelihood ratio { test, } p(x;h ) Γ(x) =log = logc logp(x;h ) p(x;h ) =logc+ p logπ + log R + xt R x Γ (7) This likelihood ratio test axiizes the probability of detection, p(h ;H ) for a fixed probability of false alar, p(h ;H ), which is controlled by the threshold Γ We 6

7 incorporate all the constant ters into a new threshold, η, such that the test in (7) becoes D R (x) = x t R x η () If we further assue that R = TΛT t, where T and Λ are orthonoral and diagonal atrices respectively, the test in (7) can be written as a weighted su of p uncorrelated coordinates, p x D i Λ ( x) = η (9) λ i i= where x = T t x, and λ i [Λ] ii ( i p) Finally, because the su in (9) involves only independent ters, it can be evaluated distributedly across a sensor network while requiring iniu counication B llipsoid Volue as a Measure of Detection Accuracy The ellipsoid volue approach [3], [43], [46] easures anoaly detection accuracy without requiring labeled anoalous saples Because anoalies are rare and loosely defined events, we often lack enough test saples labeled as anoalous to estiate the probability of detection, p(h ;H ) required for ROC analysis [3] Instead of relying on anoalous saples, the ellipsoid volue approach seeks to easure detection accuracy by characterizing how well a covariance estiate, ˆR odels the typical data saples It evaluates the volue of the region within the ellipsoid, x t ˆR x η for a certain probability of false alar controlled by η Such a volue is evaluated by π p/ V(ˆR,η) = Γ(+p/) ηp ˆR (3) We usev(ˆr,η) as a proxy for the probability of issed detection, p(h ;H ) Saller values of V(ˆR,η) indicate saller chances of an anoalous saple lying within this ellipsoid, and therefore being wrongly classified as typical Therefore, for a fixed probability of false alar, saller values of V(ˆR,η) indicate higher detection accuracy V XPRIMNTAL RSULTS We provide experiental results using siulated and real data to quantify the effectiveness of our proposed ethod In all experients, we assue counications occur between sensors connected in a hierarchical network with binary tree topology, and that counication of one scalar value between adjacent sensors uses one unit of energy We copare the vector SMT decorrelation with two other approaches for processing the sensor outputs, a centralized and an independent one In the centralized approach, all sensors counicate their h-diensional vector outputs to the root of the tree This approach is very counication intensive, but once all the data is centrally located, any decorrelation algorith can be used to decorrelate x We choose the scalar SMT algorith because it has been shown to provide accurate decorrelation fro liited training data since it approxiates the axiu likelihood estiate In the independent approach, each sensor coputes a partial likelihood of its output independently and counicates it to the root of the tree Processing/Decorrelation Methods Method Algorith Counication Decorrelation Vector SMT Vector SMT Between pairs of sub-vector pairs (distributed) nodes / caching in network Centralized Scalar SMT Vector outputs to coordinate pairs centralized node at single node Independent None Partial likelihoods to centralized node Fig 4: The experiental setup: Suary of the several approaches to sensor output decorrelation copared and their ain properties Steps for decorrelation and anoaly detection used in our experiental results ach sensor encodes its output as an h-diensional vector using PCA xperients with artificial data replace the sensor vector outputs with artificially generated rando vector data The outputs are processed in the network before a detection decision is ade The root sensor adds the partial likelihoods fro all sensors and akes a detection decision without decorrelating the sensor outputs This requires the least counication aong all approaches copared Fig 4 suarizes these approaches in ters of their ain coputation and counication characteristics Finally, Fig 4 shows the event detection siulation steps by a caera network in several of our experients ach caera sensor records an iage and encodes its h-diensional vector output using principal coponent analysis (PCA) We process the outputs using one of the approaches in Fig 4 before aking a detection decision A Siulation experients using artificial odel data In these experients, we study how the vector SMT odel accuracy changes with (i) different choices of decorrelating transfors used as the pairwise transfor between two sensor outputs, and (ii) different values of the energy constraint paraeter, µ used in the constrained design in Sec III-C We siulate a network with L = 3 sensors, in which each sensor i outputs a vector, x (i) with h = 5 diensions These sensor vector outputs are correlated Fig 5 shows how we generate a data saple x, aggregated fro correlated sensor outputs x (i), i =,,3 First, we draw eachx (i) independently, fro the N(,R) distribution, with the h h covariance atrix, [R] rs = ρ r s, where ρ = 7 Then we perfor rando perutations of the individual coordinates of x across all x (i), i =,,3, to spread correlations aong all sensor outputs Finally, each x (i) is the output of a sensor i interconnected in a hierarchical network with binary tree topology Fig 6 shows the vector SMT odel accuracy vs counication energy required for decorrelation for three different choices of pairwise transfors: scalar SMT with fixed nuber of Givens rotations, scalar SMT with MDL criterion, and Karhunen-Loève (eigenvector atrix fro 7

8 x x h x x h x x h N(,R) N(,R) N(,R) x x h x x h Fig 5: Generation of a data saple, x aggregated fro correlated h-diensional sensor outputs x (i), i =,,L, using an artifical odel First we draw each x (i) independently fro the N(,R) distribution, with [R] rs = ρ r s Then, we perute individual coordinates of x across all x (i), i =,,L to spread correlations aong all sensor outputs ach x (i) is the output of a sensor i connected to other sensors in a hierarchical network with binary tree topology log likelihood L =3, h = nergy x 4 µ= µ=4 µ= ellipsoid log vol L =3, h =5 µ= µ=4 µ= nergy x 4 Fig 7: Coparison of vector SMT energy constraint paraeter values for a range of counication energies using training data saples fro an artificial odel average log-likelihood over 3 test saples; ellipsoid log-volue covering 99% of the test saples (% false alar rate) Vector SMT odels with larger µ are the ost accurate for fixed sall energy values For large energy values, the constrained odels tend to exhibit sub-optial accuracies copared to the unconstrained vector SMT log likelihood scalar SMT, fixed K scalar SMT, MDL Karhunen Loeve 4 6 nergy x 4 ellipsoid log vol 5 95 scalar SMT, fixed K scalar SMT, MDL Karhunen Loeve 4 6 nergy x 4 Fig 6: Vector SMT odel accuracy vs counication energy consuption using training data saples fro an artificial odel Coparison of different vector SMT pairwise transfors for a range of counication energies: average loglikelihood over 3 test saples; ellipsoid log-volue covering 99% of the test saples (% false alar rate) The choice of scalar SMT MDL produces the best increase in accuracy, easured by both etrics the exact diagonalization of the pairwise saple covariance) We easure accuracy by the average log-likelihood of the vector SMT odel over n = 3 testing saples (Fig 6), and the ellipsoid log-volue covering 99% of the testing saples, ie, for % false alar rate (Fig 6) In general the odel accuracy iproves to an optial level and then starts to decrease as ore energy is spent with pairwise transfors This decrease in accuracy happens because vector SMT odels with a large nuber of pairwise transfors tend to overfit the training data For scalar SMT-MDL pairwise transfors, the MDL criterion adjusts the nuber of Givens rotations for each new pairwise transfor according to an estiate of the correlation still present in the data [], helping to prevent overfitting Since it is overall the ost accurate, the scalar SMT- MDL is our pairwise transfor of choice during all other experients in this paper Fig 7 shows odel accuracy vs counication energy for three choices of the energy constraint paraeter µ The accuracy is easured by average odel log-likelihood (Fig 7) and ellipsoid log-volue covering 99% of the testing saples (Fig 7) The paraeter µ selects the trade-off between odel accuracy and energy consuption For a sall fixed energy value, the vector SMT with largest µ value produces the ost accurate odel For large values of energy, the constrained vector SMT accuracy tends to level out at sub-optial values while the unconstrained vector SMT has the highest accuracy B Siulation experients using artificial oving sphere iages In this experient, we apply the vector SMT to decorrelate two siultaneous caera views for anoaly detection We generate artificial iages of a 3D sphere placed at rando positions along two straight diagonal lines over a plane, as illustrated in Figs and We refer to sphere positions along the line in Fig as typical ones, while referring to positions along the irrored diagonal line in Fig as anoalous ones Two caeras (L = ) onitor the sphere locations in the 3D region Fig (c) shows the top (X-Y) view captured by caera, while Fig (d) shows the side (X-Z) view captured by caera Note that it is ipossible to tell anoalous fro typical sphere positions by looking at the views in Figs (c) and (d) separately Instead, one needs to process both views together to extract useful discriinant inforation ach caera outputs a vector of h = diensions with its largest PCA coponents The joint output fro both caeras for a saple We use typical saples to train the detectors using vector SMT decorrelation and independent processing of the views During testing, we use saples, disjoint fro the training set, with typical, and another anoalous saples Figs (e) and (f) copare the detection accuracy using both independent processing and vector SMT to decorrelate the joint caera outputs Both the ROC analysis (Fig (e)) and ellipsoid log-volue coverage plot (Fig (f)) suggest that when the two views are processed independently, the detector cannot distinguish anoalous fro typical saples However, when the vector SMT decorrelates both views, anoaly detection is very accurate Fig 9 shows sets with five eigen-iages associated with the largest eigenvalues for both the independent (Fig 9) and the vector SMT (Fig 9) processing approaches In the independent processing case, each eigen-iage is associated with a single caera view On the other hand, the vector SMT processing produces eigen-iages, each odeling both caera views jointly C Siulation experients using artificial 3D sphere cloud iages In this experient, we onitor clouds of spheres using twelve siultaneous caera views for the purpose of

6 x z y z x y 6 4 4 6 llipsoid Log Volue(p=) 66 64 6 6 Prob False Alar 6 67 log vol(p fa =) 66 65 64 6 4 (c) 4 6 (e) Log Volue (p=) 5 4 3 (d) 4 6 Fig : Siulated 3D space with bouncing sphere: the

accuracies using independent processing and vector SMT joint processing: (e) ROC curve; (f) coverage plot with log-volue of ellipsoid vs probability of false alar Fig 9: igen-iages of the oving

views are processed independently, each eigenvector odels a single view; when the caera views are processed jointly using the vector SMT, each eigenvector odels both views together anoaly detection

sphere positions are generated fro the N(,I 3 3 ) distribution, but only positions with distance fro the origin exceeding a fixed threshold are selected, so that the resulting cloud is hollow; and

different caeras fro different viewpoints, and each caera encodes its output using PCA to a vector ofh = diensions Fig shows the twelve caera views for both a typical cloud saple (Fig ), and for an

9 6 x z y z x y llipsoid Log Volue(p=) Prob False Alar 6 67 log vol(p fa =) (c) 4 6 (e) Log Volue (p=) (d) 4 6 Fig : Siulated 3D space with bouncing sphere: the sphere takes rando positions along the line indicated by the double arrow typical behavior; anoalous behavior The caera views: (c) top (X-Y diensions); (d) side (X-Z diensions) The detection accuracies using independent processing and vector SMT joint processing: (e) ROC curve; (f) coverage plot with log-volue of ellipsoid vs probability of false alar Fig 9: igen-iages of the oving sphere experient The five eigen-iages (coluns) are associated with the five largest eigenvalues in decreasing order (left-toright) ach eigen-iage has two views (top and botto rows) when the caera views are processed independently, each eigenvector odels a single view; when the caera views are processed jointly using the vector SMT, each eigenvector odels both views together anoaly detection We artificially generate sphere clouds randoly positioned in the 3D space, each containing 3 spheres There are two types of clouds according to the sphere position distribution: (i) typical: the sphere positions are generated fro the N(,I 3 3 ) distribution, but only positions with distance fro the origin exceeding a fixed threshold are selected, so that the resulting cloud is hollow; and (ii) anoalous: the rando positions for the spheres are drawn fro the N(,I 3 3 ) distribution without further selection so that the resulting cloud is dense We onitor the sae 3D cloud using L = different caeras fro different viewpoints, and each caera encodes its output using PCA to a vector ofh = diensions Fig shows the twelve caera views for both a typical cloud saple (Fig ), and for an anoalous one (Fig ) ach data saple is fored by aggregating the twelve caera outputs We generate typical saples to train the detectors, and another test saples, with typical, (f) nergy (c) Fig : Anoaly detection accuracy using the sphere cloud data: ROC analysis; log-volue of ellipsoid vs probability of false alar Vector SMT decorrelation yields to the ost accurate detection results for all false alar rates (c) log-volue of ellipsoid for % false alar rate, ie, 99% coverage vs counication energy and anoalous Fig shows anoaly detection accuracy based on ROC analysis (Fig ), and log-volue of ellipsoid (Fig ) Aong all ethods copared, detection using independent processing is the least accurate, while both the centralized processing using scalar SMT and the distributed processing using vector SMT lead to high detection accuracies Intuitively, as the views in Fig suggest, it is difficult to distinguish between typical and anoalous saples by processing each view independently Instead, the inforation that helps distinguishing an anoalous cloud fro the typical ones is contained in the joint view of the caera iages Fig (c) shows the ellipsoid log-volue for % false alar rate vs the counication energy for the different approaches copared Independent processing is the least accurate while requiring the iniu energy aong all approaches The centralized approach is very accurate, but it requires significant counication energy In the vector SMT decorrelation, each pairwise decorrelation increases the detection accuracy while consuing ore energy There is a trade-off between detection accuracy and energy consuption, and one can choose the nuber of pairwise transfors to apply based on the desired accuracy and available energy budget Finally, detection is ore accurate when using vector SMT decorrelation copared to the scalar SMT for the sae energy consuption This difference in accuracy is due to the inherent constraint of the vector SMT decorrelating pairs of vectors, which tends to produce better odels of a distribution when a liited nuber of training saples is available D Siulation experients using real ulti-caera iages Fig shows L = caera views of a courtyard, constructed fro video sequences fro the UCR Videoweb Activities Dataset [47] ach caera records a video sequence of approxiately 4 in, with 3 fraes/sec, generating a total of 76 fraes The sequences are 9

Fig : The twelve caera views of a 3D sphere cloud saple: a typical saple (hollow cloud); an anoalous saple (dense cloud) It is difficult to discriinate anoalous fro typical saples by processing each

right, onitor a courtyard fro different viewpoints Several activities in the courtyard are captured siultaneously by several caeras TABL I: Correlation score values for all pairs of views in the

scores 3 4 5 6 7 7 59 66 6 74 7-59 66 59 7 76 3 - - 6 49 59 6 4 - - - 57 66 6 5 - - - - 59 6 6 - - - - - 7 7 - - - - - - - - - - - - - synchronized, so that ultiple caeras capture events

10 Fig : The twelve caera views of a 3D sphere cloud saple: a typical saple (hollow cloud); an anoalous saple (dense cloud) It is difficult to discriinate anoalous fro typical saples by processing each view independently Instead, the discriinant inforation is contained in the joint caera views Fig : The courtyard dataset fro the UCR Videoweb Activities Dataset: eight caeras, with ids to fro left to right, onitor a courtyard fro different viewpoints Several activities in the courtyard are captured siultaneously by several caeras TABL I: Correlation score values for all pairs of views in the courtyard dataset The correlation score easures the correlation of caera outputs between pairs of caera views Pairs of caeras capturing the sae events siultaneously have the highest correlation scores synchronized, so that ultiple caeras capture events siultaneously We subsaple in 3 fraes fro the 76- frae sequence, and use of the selected saples to copute the encoding PCA transfors for each caera view The final courtyard dataset has 734 saples of p = 6 diensions, with each view encoded in a subvector of h = diensions Table I shows correlation score values for all view pairs Pairs of highly correlated views, capturing ostly the sae events (as with caeras and 6), receive higher score values than weakly correlated view pairs The events captured by caera are unrelated, and therefore uncorrelated, to the events captured by the other caeras, resulting in negligible correlation score values Fig 3 shows two eigen-iages associated with the two largest eigenvalues for both the independent and vector SMT approaches In the independent processing case (Fig 3), each eigen-iage corresponds to a single caera view, containing no inforation regarding the relationship between different views On the other hand, the vector SMT eigen-iages (Fig 3) contain joint inforation of the correlated views Since caera view is not correlated with any other view, it does not appear together with others in the sae eigen-iage Fig 4 copares the accuracy of all approaches easured by the log-volue of the ellipsoid covering test saples We split the saples into a training set, with 3 saples, and a test set, with 434 saples Fig 4 shows the ellipsoid log-volue coputed for all false alar rates The vector SMT is the ost accurate approach, with its volues being the sallest across all false alar rates The vector SMT volues are also saller than the scalar SMT volues As discussed in Sec V-C, the vector SMT is ore accurate than the scalar SMT because of the nature of its constrained decorrelating transfor when trained with a sall training set Fig 4 shows results of the sae experient as in Fig 4 with the vector SMT odel order selected so that the distributed decorrelation consues only 5% of the energy required for the centralized approach Fig 4(c) shows the ellipsoid log-volue for a fixed false alar rate (%) vs counication energy We observe the sae trends observed in the sphere cloud experient in Sec V-C The independent approach has low accuracy while requiring low counication energy The centralized decorrelation is highly accurate, but it requires large aounts of counication energy The vector SMT increases the detection accuracy after each pairwise transfor Finally, the vector SMT approach has siilar accuracy to the centralized approach for all false alar rates while requiring significantly less counication energy Figs 5-(c) show ROC curves for detection of anoalous saples generated by an artificial 4-fold increase in the largest coponent of the vector output of a single caera view, and injected in views, 6, and, respectively We use typical saples to learn the decorrelating transfor and the reaining saples for testing Since views and 6 are correlated with other views (see Table I), detection of anoalies in these views is accurate when we decorrelate the views using the vector and scalar SMT approaches, and very inaccurate when we process the views independently Because view is uncorrelated with other views, decorrelation does not help iprove detection accuracy and all approaches are inaccurate Figs 5(d)-(f) show the ROC curves for detection of what we call the Ocean s leven anoaly, injected into

11 Fig 3: Two eigen-iages fro the eight caera views of the courtyard dataset ach eigen-iage has eight views (coluns) associated to it independent processing of caera views: each eigen-iage corresponds to a single view and does not contain correlation inforation aong ultiple views; joint processing odeled by the vector SMT: each eigen-iage contains joint inforation of all correlated views (c) (d) (e) (f) Fig 5: ROC analysis of detection accuracy: -(c) artificially generated anoalies by a 4-fold increase in the largest eigenvalue of a single view for views, 6 and, respectively (d)-(f) Ocean s leven anoalies, generated by swapping iages of a single caera view between saples for views, 6 and, respectively Decorrelation iproves detection accuracy when anoalies appear in correlated caera views ( and 6) When the anoaly is inserted in a uncorrelative view (), decorrelation ethods do not iprove the detection accuracy (g) people coalescing in the iddle of a courtyard: scalar and vector SMTs are highly accurate for sall probabilities of false alar with vector SMT consuing approxiately 6% of counication energy required for the scalar SMT (g) llipsoid Log Volue(p=6) Prob False Alar log vol(p fa =) llipsoid Log Volue(p=6) nergy (c) 5 3 Prob False Alar Fig 4: Detection accuracy easured by the ellipsoid log-volue for the courtyard data set Coverage plots showing the log-volue vs probability of false alar: odel order, M = 7, atching the energy of centralized processing, odel order, M = 4, atching 5% of the energy consued for the centralized processing; (c) log-volue vs counication energy for fixed probability of false alar, = When the counication energy is equal to the level required to execute the scalar SMT at a centralized node, the vector SMT has better detection accuracy When the energy level is 5% of the level required by the centralized approach, the vector SMT has siilar accuracy the caera views, 6, and, respectively This anoaly is generated by swapping iages of a single view between two saples captured at different instants We refer to it as the Ocean s leven anoaly because of the reseblance with the anoaly created to trick the surveillance caeras during the casino robbery in the Ocean s leven fil [4] Since views and 6 are correlated with other views, detection is accurate when we decorrelate the views with scalar and vector SMTs, and very inaccurate when we process the views independently Because view is uncorrelated with the other views, decorrelation does not help iprove detection accuracy and all approaches are inaccurate Fig 5(g) shows the ROC curves for detection of a suspicious (anoalous) activity where people coalesce in at the center of the courtyard Fig 6 shows the typical and anoalous saples used in this experient We select saples where a group of people coalesces at the center of the courtyard and label the as anoalous, while selecting another saples where the group does not coalesce and label the as typical We use another 3 typical saples to train the vector SMT The vector SMT decorrelation in this experient consues 6% of the counication energy required for the scalar SMT Detection is very accurate when using vector and scalar SMTs for view decorrelation, and inaccurate when processing the views independently, specially for low probabilities of false alar Siilarly to the detection of dense clouds (see Sec V-C), it is difficult to detect people coalescing when processing caera views independently Instead, one needs to to consider the views jointly for good detection accuracy

Leonardo R. Bachega*, Srikanth Hariharan, Charles A. Bouman, Fellow, IEEE, and Ness Shroff, Fellow, IEEE

Leonardo R. Bachega*, Srikanth Hariharan, Charles A. Bouman, Fellow, IEEE, and Ness Shroff, Fellow, IEEE Distributed Signal Decorrelation and Detection in Multi View Camera Networks Using the Vector Sparse Matrix Transform Leonardo R. Bachega, Srikanth Hariharan, Charles A. Bouman, Fellow, IEEE, and Ness