TRACKING objects of interest is an important and challenging

Similar documents
Computing Relevance, Similarity: The Vector Space Model

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Clustering (Bishop ch 9)

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Cubic Bezier Homotopy Function for Solving Exponential Equations

( ) () we define the interaction representation by the unitary transformation () = ()

Linear Response Theory: The connection between QFT and experiments

Solution in semi infinite diffusion couples (error function analysis)

Variants of Pegasos. December 11, 2009

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Robust and Accurate Cancer Classification with Gene Expression Profiling

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

A Novel Object Detection Method Using Gaussian Mixture Codebook Model of RGB-D Information

CHAPTER 10: LINEAR DISCRIMINATION

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Introduction to Boosting

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Particle Filter Based Robot Self-localization Using RGBD Cues and Wheel Odometry Measurements Enyang Gao1, a*, Zhaohua Chen1 and Qizhuhui Gao1

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

On One Analytic Method of. Constructing Program Controls

TSS = SST + SSE An orthogonal partition of the total SS

Mechanics Physics 151

Mechanics Physics 151

Robustness Experiments with Two Variance Components

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video

( ) [ ] MAP Decision Rule

Lecture 11 SVM cont

Lecture 6: Learning for Control (Generalised Linear Regression)

Chapter Lagrangian Interpolation

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Advanced Machine Learning & Perception

Algorithm Research on Moving Object Detection of Surveillance Video Sequence *

Volatility Interpolation

Polymerization Technology Laboratory Course

FI 3103 Quantum Physics

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

Filtrage particulaire et suivi multi-pistes Carine Hue Jean-Pierre Le Cadre and Patrick Pérez

Math 128b Project. Jude Yuen

Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation

CS286.2 Lecture 14: Quantum de Finetti Theorems II

10. A.C CIRCUITS. Theoretically current grows to maximum value after infinite time. But practically it grows to maximum after 5τ. Decay of current :

Let s treat the problem of the response of a system to an applied external force. Again,

Chapter 6: AC Circuits

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Lecture VI Regression

II. Light is a Ray (Geometrical Optics)

Extended MHT Algorithm for Multiple Object Tracking

Graduate Macroeconomics 2 Problem set 5. - Solutions

doi: info:doi/ /

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

An introduction to Support Vector Machine

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

An Integrated and Interactive Video Retrieval Framework with Hierarchical Learning Models and Semantic Clustering Strategy

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Notes on the stability of dynamic systems and the use of Eigen Values.

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Bernoulli process with 282 ky periodicity is detected in the R-N reversals of the earth s magnetic field

Testing a new idea to solve the P = NP problem with mathematical induction

A Novel Efficient Stopping Criterion for BICM-ID System

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

Department of Economics University of Toronto

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

An Effective TCM-KNN Scheme for High-Speed Network Anomaly Detection

Density Matrix Description of NMR BCMB/CHEM 8190

Li An-Ping. Beijing , P.R.China

Efficient Asynchronous Channel Hopping Design for Cognitive Radio Networks

Video-Based Face Recognition Using Adaptive Hidden Markov Models

FTCS Solution to the Heat Equation

Detection of Waving Hands from Images Using Time Series of Intensity Values

Attributed Graph Matching Based Engineering Drawings Retrieval

Displacement, Velocity, and Acceleration. (WHERE and WHEN?)

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

Machine Learning Linear Regression

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Decentralised Sliding Mode Load Frequency Control for an Interconnected Power System with Uncertainties and Nonlinearities

Anisotropic Behaviors and Its Application on Sheet Metal Stamping Processes

Mechanics Physics 151

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

, t 1. Transitions - this one was easy, but in general the hardest part is choosing the which variables are state and control variables

Advanced time-series analysis (University of Lund, Economic History Department)

PSO Algorithm Particle Filters for Improving the Performance of Lane Detection and Tracking Systems in Difficult Roads

Analysis And Evaluation of Econometric Time Series Models: Dynamic Transfer Function Approach

WiH Wei He

Effect of Resampling Steepness on Particle Filtering Performance in Visual Tracking

Appendix to Online Clustering with Experts

2. SPATIALLY LAGGED DEPENDENT VARIABLES

THERMODYNAMICS 1. The First Law and Other Basic Concepts (part 2)

Relative controllability of nonlinear systems with delays in control

CHAPTER 2: Supervised Learning

Kernel-Based Bayesian Filtering for Object Tracking

Transcription:

1 An equalzed global graph model-based approach for mul-camera obec rackng Wehua Chen*, Lun Cao*, Xaoang Chen, Member, IEEE, and Kaq Huang, Senor Member, IEEE arxv:1502.03532v2 [cs.cv] 19 Jul 2016 Absrac Non-overlappng mul-camera vsual obec rackng ypcally consss of wo seps: sngle camera obec rackng and ner-camera obec rackng. Mos of rackng mehods focus on sngle camera obec rackng, whch happens n he same scene, whle for real survellance scenes, ner-camera obec rackng s needed and sngle camera rackng mehods can no work effecvely. In hs paper, we ry o mprove he overall mul-camera obec rackng performance by a global graph model wh an mproved smlary merc. Our mehod reas he smlares of sngle camera rackng and ner-camera rackng dfferenly and obans he opmzaon n a global graph model. The resuls show ha our mehod can work beer even n he condon of poor sngle camera obec rackng. Index Terms Mul-camera mul-obec rackng, global graph model, non-overlappng vsual obec rackng I. INTRODUCTION TRACKING obecs of neres s an mporan and challengng problem n nellgen vsual survellance sysems [1]. Snce he vsual survellance sysems provde huge amoun of vdeo sreams, s desrable ha obecs of neres can be auomacally racked by algorhms nsead of human. Vsual obec rackng [2] s a long-sandng problem n compuer vson, and here are a grea amoun of effors made n vsual obec rackng whn sngle cameras [3], [4], [5]. In nellgen vsual survellance sysems [6], [7], due o he fne camera feld of vew, s dffcul o observe he complee raecory of obecs of neres n wde areas wh only one camera. Hence, s desred o enable he nellgen vsual survellance sysem o rack he obecs of neres whn mulple cameras [8]. In addon, for praccal consderaons, he nellgen vsual survellance sysem usually holds he cameras nsalled wh no overlappng areas. Thus, he nellgen vsual survellance sysem should be able o rack obecs of neres across mulple non-overlappng cameras. In hs paper, we focus on addressng he problem of rackng obecs of neres across mulple non-overlappng cameras. As shown n Fg. 1 (Soluon A), prevous vsual obec rackng approaches ackle he problem n wo dfferen seps: sngle camera obec rackng (SCT) [9], [10], [11] and ner-camera obec rackng (ICT) [12], [13], [14]. SCT approaches [9], [10], [11] aemp o compue he raecores of mulple obecs from a sngle camera vew, whle ICT *Wehua Chen and Lun Cao conrbued equally o hs work. Kaq Huang s he correspondence auhor. The auhors are wh he Cener for Research on Inellgen Percepon and Compung, Naonal Laboraory of Paern Recognon, Insue of Auomaon, Chnese Academy of Scences, No.95 Zhong- GuanCun Eas S, HaDan Dsrc, Beng, P.R.Chna, 100190. E-mal: {wehua.chen,lun.cao,xchen,kqhuang}@nlpr.a.ac.cn. approaches [12], [13], [14] am o fnd he correspondences among hose raecores across mulple camera vews. These ICT approaches ofen use he raecores obaned from SCT o acheve her daa assocaon, hence he overall rackng sysem s brle and he overall performance depends on he resuls of he sngle camera obec rackng module. For challengng scene vdeos, exsng SCT approaches [15], [16], [17] are also frangble snce he resuls ofen conan fragmens and false posves. The drec dsurbance of hese false posves and fragmens brng problems no ICT module, such as wrong machng problem,. e. wo arges n Camera 2 are mached o dfferen rackles of a same arge n Camera 1 (see Fg. 2 (a)), and rackle mssng problem,. e. some rackles of a arge are mssng durng ner-camera rackng (see Fg. 2 (b)). These problems are nevable as long as he mulcamera obec rackng s solved n wo seps. We address hese problems by negrang he wo separae modules and only opmsng hem. We develop a global mul-camera obec rackng approach. I negraes wo seps ogeher va an equalzed global graph model o avod hese nevable problems and ams o mprove he overall performance of mul-camera obec rackng. Consderng wo dfferen seps, we evaluae he overall performance from he followng wo crera: Sngle camera obec rackng: measurng how well he compleed pedesran raecores n a sngle camera can be used o rebuld her exac hsorcal pahs n each scene. Iner-camera obec rackng: evaluang how well he ner-camera machng help o locae he pedesrans n a wde area. As shown n Fg. 1 (Soluon A), SCT and ICT share a smlar daa assocaon framework: a graph modelng wh an opmsaon soluon. In he sngle camera obec rackng module, he daa assocaon npus are he nal observaons, such as deecons or rackles, and he oupus are he negraed raecores n each sngle camera (known as md-erm raecores). These md-erm raecores are hen used as npus o acheve he daa assocaon n ner-camera obec rackng, and he oupus of he ICT approaches are he fnal negraed raecores n mul-cameras (known as fnal raecores). To negrae hese wo daa assocaons, he sraghforward dea s o esablsh a new daa assocaon whch akes nal observaons as npus and oupus he fnal raecores drecly. However, a new problem arses,. e. how o measure he smlary beween wo observaons n he new graph. Some smlares are from he observaons

2 Fg. 1. Illusraon of hree ypes of mul-camera vsual obec rackng soluon. whch belong o he same camera, and ohers are from hose belong o dfferen cameras. If under he same smlary merc, he average smlary score beween observaons n dfferen cameras would be commonly lower1 han ha from observaons n he same camera, because he appearance nformaon and he spao-emporal nformaon of obecs are less relable n ICT han hose n SCT due o many facors (camera sengs, vewpons and lghng condons). In hs case, he opmsaon process makes he graph gve prory o lnkng he observaons followng he edges n he same camera nsead of hose across cameras, whch would cause a faled opmzed resul for he whole mul-camera obec rackng. To solve hs problem, we have o handle wo quesons: how o dsngush he smlares n a same camera from hose n dfferen cameras, and how o balance hem n he new graph? In hs paper, we mprove he smlary merc, make a dfference beween smlares of SCT and ICT, and equalze hem n a global graph. A mnmum unceran gap [18] s adoped o esablsh he mproved smlary merc. Thanks o hs, he smlary scores n boh SCT and ICT are equalzed n he proposed global graph model. The conrbuons of hs paper2 are as follows. 1) a global graph model for mul-camera obec rackng s presened whch negraes SCT and ICT seps ogeher o avod he nevable problems; 2) an mproved smlary merc s proposed o equalze he dfferen smlares n wo seps and unfy hem n one graph; 3) he proposed approach s expermened on a comprehensve evaluaon creron whch clearly shows ha our mehod s more effecve han he radonal wo-sep mul-camera vsual rackng framework. II. R ELATED W ORK Usng a graph model s an effcen and effecve way o solve he daa assocaon problem n mul-camera vsual obec rackng. Frs, a graph modelng s used o form a solvable graph model wh npu observaons (deecons, rackles, raecores or pars). I ncludes nodes, edges and weghs. Then an opmsaon soluon s brough n o solve 1 The hgher smlary score ndcaes a hgher lkelhood of he lnk for wo observaons. 2 A prelmnary verson of hs paper appeared n Chen e al. [19] and he source code s avalable n he lnk (hps://ghub.com/cwhgn/egtracker). he graph and obans opmal or subopmal soluons. The dfference s ha sngle camera obec rackng (SCT) emphaszes parcularly on he graph and he opmsaon soluon,. e. how o buld a more effcen or more dscrmnave graph. Whle ner-camera obec rackng (ICT) focuses on nodes, edges and weghs, whch prefers geng a more effecve feaure represenaon. The ICT has more complex and more sophscaed represenaons or smlary mercs (. e. a ranson marx), bu wh a smpler graph model. The proposed approach akes advanages of boh SCT and ICT. The proposed smlary merc s exended from a classcal nercamera rackng mehod [20] and he global graph model akes advanage of a sae-of-he-ar SCT approach [21]. Ths secon nroduces relaed approaches for each par of SCT, ICT and MCT. Secon 2.1 revews he sngle camera mul-obec rackng. Secon 2.2 dscusses he nercamera obec rackng wh a bref nroducon of obec re-denfcaon. Secon 2.3 shows some oher mul-camera obec rackng approaches ha ake boh SCT and ICT no accoun. A. Sngle Camera Obec Trackng (SCT) In sngle camera mul-obecs rackng, he predcon of he spao-emporal nformaon of obecs s more relable and he appearance of obecs does no have many varaons durng rackng. Ths makes he SCT ask less challengng han he ICT ask.. e. for some less challengng vdeos, a smple appearance represenaon (e.g. color hsogram [22], [23], [24]) works well. The graph model s ofen used o solve dfferen problems, such as occluson [25], [26], crowd [24], [27] and nerference of appearance smlary [28], [29]. However, for challengng vdeos, hese approaches lead o frequen d-swch errors and raecory fragmens. Exsng approaches n SCT usually follow a daa assocaon-based rackng framework, whch lnk shor rackles [19], [23], [30] or deecon responses [31], [32], [33] no raecores by a global opmzaon based on varous knds of feaures, such as moon (poson, velocy) and appearance (color, shape). The mprovemens always develop from wo aspecs: he graph model and he opmzaon soluon. Some researchers focus on developng a new graph model for her rackles or deecons and am o solve a specfc problem. In Possegger e al. [26], a geodesc mehod s adoped o handle he occluson problem. Dcle e al. [28] use moon dynamcs o solve generalzed lnear assgnmens when arges wh smlar appearances exs. Oher works n SCT focus on he mprovemen of he opmzaon soluon framework, such as connuous energy mnmzaon [34], lnear programmng [35], CRF [36] and he mxed neger program [37]. Zhang e al. [21] propose a maxmum a poseror (MAP) model o solve he daa assocaon of he mul-obec rackng, whle Yang e al. [36] ulze an onlne CRF approach o handle he opmzaon wh he benef of dsngushng spaally close arges wh smlar appearances. These approaches can parly yeld d-swches and raecory fragmens, bu he separaed opmsaon makes hem suffer from leavng many fragmens and false posves o ICT sep.

3 (a) Wrong machng (b) Trackle mssng and exure [52]. Recenly, L e al. [53] successfully apply CNN on Re-ID o exrac an effecve feaure represenaon. However he hghes denfcaon rae s sll below 0.3 under benchmarks and he approaches are also no praccal. As we sad, he ICT approaches have a common assumpon ha he sngle camera obec rackng resuls are perfecly done and he raecores n sngle cameras are all rue posve and negraed compleely. Bu unl now, hey are dffcul o be acheved. Fg. 2. Illusraon for he wo machng problems. Blue and red lnes ndcaes wo arges and arrows show he bes machng. Targe B s mached o rackle A2 wrongly n (a). Trackle A1 s mssng n (b). B. Iner-camera Obec Trackng (ICT) Iner-camera rackng s more challengng han SCT because of s greaer dramac changes n appearance caused by many facors (camera sengs, vewpons and lghng condons) and less relable spao-emporal nformaon n dfferen camera vews. As a resul, how o learn a dscrmnave and nvaran feaure represenaon and a suable smlary merc are he man problems n ICT. Mos ICT works solve hese problems from mul-camera calbraon [38], [39], [40] and feaure cues [41], [42], [43], [44], [45]. For mul-camera calbraon, as an mmoble nformaon, he approaches n hs aspec always proec he mulple scenes no a 3D coordnae sysem, and acheve he machng by usng proeced poson nformaon. Hu e al. [39] adop a prncpal axs-based correspondence o acheve he calbraon. For feaure cues, mos approaches ulze mproved appearance or spao-emporal nformaon o acheve he machng. Kuo e al. [42] apply a mulnsance learnng approach o learn an appearance affny model, whle Mae e al. [43] negrae appearance and spaoemporal lkelhoods whn a mul-hypohess framework. From he perspecve of he graph modelng, a K-camera ICT daa assocaon can be reaed as a K-pare graph machng problem. I s dffcul o ge he opmal soluon, bu here re many approaches o ge he subopmal soluons, e.g. he weghed bpare graph [46], he Hungaran algorhm [47] and he bnary neger program [48]. The K-pare dea holds an assumpon ha each camera has had a perfec rackng resul whch should no be changed any more. In pracce, he SCT resul s no deal and he assumpon s broken. In hs case, he SCT resul should be modfable and he daa assocaon s more lke a global opmzaon problem han he K-pare graph machng problem. A he end of nroducng ICT, s worh menonng ha obec re-denfcaon (Re-ID) s an mporan par n ICT. When he opology of he camera nework s no avalable or he scenes are no overlapped, he spao-emporal nformaon s nvald. In hs case, he appearance cue s he only nformaon can be used for machng. Sudyng obec re-denfcaon separaely helps o beer undersand he capably of obec machng by usng vsual feaures alone. Mos obec redenfcaon mprovemens manly focus on some ceran appearance of obecs, such as color [20], [49], shape [50], [51] C. Mul-camera Obec Trackng (MCT) A good MCT s he ulmae goal for any researcher n rackng. Mos MCT mehods follow he wo-sep framework, a SCT algorhm plus an ICT algorhm. In he Mul- Camera Obec Trackng Challenge [54] n ECCV 2014 vsual survellance and re-denfcaon workshop, mehods of mos parcpang eams are wo-sep approaches. The wnner USC- Vson eam uses a sae-of-he-ar SCT mehod [32] and a sae-of-he-ar ICT mehod [41]. Besdes wo-sep approaches, here re some mul-camera obec rackng approaches [55], [56], [57], [58] concenrang on negrang he processes of SCT and ICT no one global graph as hs paper does. They manly follow a rackng-bydeecon paradgm and form a global assocaon graph (see Fg. 1 (Soluon C)). Yu e al. [56] propose a nonnegave dscrezaon soluon for daa assocaon and denfy people across dfferen cameras by face recognon. Whle for real scenes wh obecs n a dsan vew, faces are oo small o be recognzed. Hofmann e al. [58] use a global mn-cos flow graph and connec he dfferen-vew deecons hrough her overlappng locaons n a world coordnae space, whch s no suable for he non-overlappng camera problem. In hs paper, he proposed mehod uses rackle observaons as he npus nsead of obec deecons, whch are more relable for machng. We consder he mul-camera obec rackng as a global rackle assocaon under a panoramc vew (see Fg. 1 (Soluon B)). And he smlares of dfferen rackles n he global rackle assocaon are reaed dfferenly accordng o he cameras hey belongng o. Ths framework provdes a new soluon for mul-camera obec rackng when he SCT performance s no good enough for he furher ICT process. Is local performance n a specfc camera vew may be as fragmenary as ha of he radonal SCT mehods, even he ner-camera nformaon may provde some useful feedbacks for each specfc camera. Bu overcomes he new problems emergng n ICT when SCT s no good and offers a beer ICT performance. In pracce, a beer ICT has sronger praccal sgnfcance han SCT. For a vdeo survellance sysem, s more mporan o locae he obecs n he whole wde area han a sngle scene. III. GLOBAL GRAPH MODEL Our goal s o predc he raecores by usng he gven seres of observed vdeos. The proposed approach focuses on opmsng sngle camera rackng and ner-camera rackng n one global daa assocaon process. The daa assocaon

4 TABLE I NOTATIONS OF EQUALIZED GLOBAL GRAPH MODEL l A sngle npu rackle conssed of several arbues, l = [x, c, s,, a ]. L The se of all npu rackles, L = l 1, l 2,.., l M. Γ A sngle raecory hypohess conssed of an ordered ls of arge rackles, Γ = {l 1, l 2,..., l k }. Γ The oupu of he aglorhm whch s he opmal se of raecory hypohess. G The mn-cos flow graph, G = {N, E, W }. N The se of nodes n he graph, N = {S, T, l ener, l ex } [1, M]. E The se of edges n he graph, E = {e } {e S, e T } {e } [1, M]. W The se of weghs n he graph, W = {w } {w S, w T } {w } [1, M]. h n The MCSHR of rackle l n he nh frame. H The ncremenal MCSHR for he whole rackle l. Λ k, The smlary beween any MCSHR par h k and h. τ The bes perodc me for rackle l. Fg. 3. Illusraon for he mn-cos flow nework. An example for he mn-cos flow nework wh 3 meseps and 6 rackles. The number of N, E and W are 14, 21 and 21. s modeled as a global maxmum a poseror (MAP) problem whch s nspred by he same MAP formulaon from Zhang e al. [21]. The dfference s ha he npu n he proposed soluon s rackles raher han obec deecons. And he assocaon ams o solve he wrong machng and he rackle mssng problems n ICT, whle Zhang e al. [21] apply on SCT. We oulne he varable defnons n Table I. In our approach, a sngle raecory hypohess s defned as an ordered ls of arge rackles,.e. Γ = {l 1, l 2,..., l k } where l k L. The assocaon raecory hypohess Γ s defned as a se of sngle raecory hypohesses,.e. Γ = {Γ }. The obecve of he daa assocaon s o maxmze he poseror probably of Γ gven he rackle se L under he non-overlappng consrans [21]: Γ = arg max P (l Γ) P (Γ k ) Γ Γ k Γ (1) Γ Γ =,. P (l Γ) s he lkelhood of rackle l. The pror P (Γ k ) s modeled as a Markov chan conanng ranson probables P (lk+1 l k ) of all rackles n Γ k [58]. The ranson probably P (l l ) s compued by usng probables of he appearance feaure P a (l l ) and he moon feaure P m (l l ). P (l l ) = P (l l ) = (P a (l l )) k1 (P m (l l )) k2, (2) where k 1 and k 2 are he weghs of wo feaures. The MAP assocaon model can be solved by a mn-cos flow nework [19]. The mn-cos flow graph s formulaed as G = {N, E, W }, where N, E, W sands for nodes, edges and weghs respecvely and he wegh means he cos of lnkng he edge. In he graph G, here are wo nodes ener and ex defned for each rackle l. The observaon edge e from node ener o ex ndcaes he lkelhood of rackle l. The correspondng observaon wegh w s se o he negave logarhm of he lkelhood P (l Γ). The possble lnkng relaonshp beween any wo rackles s expressed as a ranson edge e from node ex o node ener, he ranson wegh w s he negave logarhm of he ranson probably P (l l ), as shown as follows, w = log P (l Γ) 1 P (l Γ). (3) The ranson wegh can also be decomposed no probables n connuy of appearance and moon, w = log P (l l )= k 1 log P a (l l ) k 2 log P m (l l ). (4) In addon o hese nodes and edges, here are wo exra nodes S, T. They are vrual source and snk for he mn-cos flow graph. The ener/ex edges e S and e T are also added n o represen he sar rackle l and he end rackle l. The ener/ex weghs of hese rackles are boh se o 0 n hs paper, because every rackle could be equally a sar or end wh no cos. In summary, he number of nodes (N) s (2M + 2), and he numbers of edges E and weghs W are smaller han he numbers of full connecon graph (3M + 2 ( ) 2M 2 ). M s he oal number of rackles n all cameras. As shown n Fg. 3, he graph s solved by he mn-cos flow, and he opmal soluon s he maxmum of he poseror probably of Γ wh he mnmum cos. In he res of hs secon, we nroduce every par of he mn-cos flow graph, especally for he weghs W. A. Nodes In he proposed approach, he rackles exraced by a sngle-obec rackng mehod are reaed as npu observaons nsead of deecons. In oher words, hese rackles are used o produce nodes n he global graph model. One of he reasons s ha hey have more nformaon (lke moon) han deecons whch only conan appearance nformaon. Wh more nformaon, hey can be consdered as more credble nodes and he smlares of hem are more relable. Wha s

5 more, he number of he rackles s much smaller han ha of deecons. I s a good way o speed up he compung me of he graph opmzaon, whch s also very mporan for praccal usages. In hs paper, he deformable par-based model (DPM) deecor [59] and an AIF racker [60] are frs used o ge all he rackles from each camera. Afer obanng deecons by he DPM deecor, we use he AIF racker o rack every arge and ge her rackles. Durng he arge rackng by he AIF racker, a confdence α [60] s calculaed o evaluae he accuracy of a rackng resul n frame. If he confdence score s lower han he hreshold θ,. e. α < θ, he racker s consdered o be los. Then all confdence values of he arge n prevous frames are recorded and he average value c s compued as he lkelhood P (l Γ) of rackle l, c = P (l Γ) = Σ end α k= sar k ( end sar ), (5) where sar and end are he sar and end frames of rackle l. So all he rackles from all cameras are obaned, L = {l 1, l 2,..., l M }, where each rackle l = [x, c, s,, a ] consss of poson, lkelhood, camera vew, me samp and appearance nformaon respecvely. The nodes N can be expressed as: B. Edges N = {S, T, l ener, l ex } [1, M] (6) Edges are also an mporan par for he graph model. All he observaon edges and ener/ex edges are reserved n he mncos flow graph. However, for he ranson edges, only a par of s reaned because ha no all he edges are meanngful. Three rules are bul for selecng ranson edges n our graph. Frsly, for edge e, he sar frame s of he rackle l mus be afer he end frame e of he rackle l whou any overlappng frame. Ths rule ensures he unqueness of obecs n every frame and keeps he edges dreced. Secondly, he wo rackles l and l should come from he same camera or wo cameras wh an exsng opologcal connecon, whch ensures he lnk of wo rackles possble from a panoramc vew. Thrdly, a wang me hreshold η s brough n o lm he lnk of wo rackles. If he me nerval beween wo rackles s long enough, longer han he hreshold η, he lkelhood of hs lnk s close o zero. As a resul, he edges ha mee all requremens are seleced and reserved, E = {e } {e S, e T } {e } [1, M], 0 < sar end < η, T opo(s, s ) = 1, where T opo(s, s ) = 1 means he camera vews of s and s have an exsng opologcal connecon. For all hese seleced edges E, he capacy s se o 0 or 1, because every arge should be a one and only one place n he same me. If he capacy s 1 n he opmal soluon, whch means hs lnk exss and he wo rackles of hs lnk belong o he same arge. (7) Fg. 4. Illusraon of compung he perodc me for a rackle. An example for a rackle wh he lengh ϖ of 9 frames. The Avg Sm column shows he valdy of every possble perodc me. The maxmum n Avg Sm column ndcaes he bes perodc me τ for hs rackle. C. Weghs Weghs are an essenal arbuon for lnks and used o represen relaonshps beween nodes. In hs paper, we mpor he smlares among rackles as weghs o ndcae he cos of buldng lnks. As menoned above, he weghs W are conssed of hree pars, he same as edges: W = {w } {w S, w T } {w } [1, M] (8) The observaon weghs can be obaned accordng o Eq. 5. And he ener/ex weghs are all se o 0 as menoned above. In he ranson weghs, he appearance smlary P a (l l ) and he moon smlary P m (l l ) are used o form he weghs. In he followng we nroduce hem respecvely. P (l Γ) w = log 1 P (l = log c Γ) 1 c, w S = w T = 0, [1, M], w = log P (l l ) = k 1 log P a (l l ) k 2 log P m (l l ). (9) 1) Appearance Smlary: As shown n Secon II, boh SCT and ICT have her own represenaons and smlary mercs, whle hose n ICT mehods are more sophscaed han hose n SCT ones. In order o buld an equalzed merc, he proposed approach adops an ICT represenaon. Bu doesn use any learnng process whch srongly ncreases he compung me. Ths represenaon s called Pecewse Maor Color Specrum Hsogram Represenaon (PMCSHR) [19]. I s an mproved verson of Maor Color Specrum Hsogram Represenaon (MCSHR) [20] wh some perodcy nformaon ha s specfc o pedesran. MCSHR obans he maor colors of a arge based on an onlne k-means cluserng algorhm. The orgnal way of compung he MCSHR of a rackle s o negrae hsograms n all frames ogeher. H = 1 ϖ ϖ n=1 h n, (10) where h n s he MCSHR of rackle l n he nh frame and H s he ncremenal MCSHR [20] for he whole rackle l. ϖ s he lengh of rackle l.

6 Fg. 5. Illusraon of he calculaon of he relave dsance. As non-rgd arges, pedesrans are challengng obecs o be racked even wh he help of he MCSHR. However, we can make some assumpons o help rackng. We assume ha pedesrans are always walkng a a consan speed n scenes, and he goal of our approach s o fnd he perodc me τ o segmen he rackles. All MCSHRs {h 1, h 2,..., h ϖ } of he rackle l are frsly obaned, and hen he smlary Λ k, beween any par h k and h s compued. The nuon s o compue all he possble perodc mes and fnd he bes one. For a ceran perodc me, he smlary Λ,+ beween h and s nex perodc h + s colleced for every frame, and he average smlary s consdered as he value whch deermnes he valdy of hs perodc me. As shown n Fg. 4, he perodc me wh a hghes valdy s consdered as our bes perodc me τ for rackle l. τ = arg max ϖ 1 ϖ =1 Λ,+ [γ, ϖ /2). (11) The se [γ, ϖ /2) s used o lm he possble range of, and γ s se o 15. If γ s oo small, he nearby frames wll have a srong smlary whch causes Eq. 11 o a false maxmum. Afer calculaon, τ s he bes perodc me for rackle l. Then he rackle l can be evenly segmened no peces wh he lengh τ (excep he end par). For each pece, he ncremenal MCSHR s compued. The PMCSHR of rackle l s represened by {H 1, H 2,..., H d }, and d = ϖ τ s he number of peces ha he rackle l s segmened no. Then every smlary beween each wo peces from rackles l and l are compued, and he average smlary Ds(l, l ) s consdered as he appearance smlary beween wo rackles. P a (l l ) = Ds(l, l ) = 1 d d d,d n=1,m=1 Sm(H n, H m), (12) where Sm(H n, H m) s he smlary merc for wo rackles ncremenal MCSHRs. 2) Moon Smlary: For a general mehod ha s avalable n boh overlappng and non-overlappng vews, s hard o always buld an exac 3D coordnae sysem o proec all scenes ogeher. Hence, n hs paper a relave dsance beween wo rackles s adoped o measure he moon smlary. For wo rackles l and l, s easy o ge her nerval me by a (a) Fg. 6. Illusraon of he ener/ex areas for he mul-camera vsual obec rackng. The ener/ex areas for lnks from Cam 1 o Cam 2 are n column (a), whle hose from Cam 2 back o Cam 1 are n column (b). The blue and yellow areas ndcaes he ex and ener areas respecvely, and he red pons represen he dsappearng pons. smple subracon. If he wo rackles are lkely o belong o one arge, he nerval me nv mus be a posve number. where sar me of rackle l. nv = sar (b) end, (13) s he sar me of rackle l and end s he end Wh he nerval me nv, he poson xal and he velocy v al of rackle l n he end me, we can predc he poson where he rackle l s behnd nv me. The new poson can be calculaed as below: x = x al + v al nv. (14) For rackle l, we can conduc he same hng and ge s predced poson nv me ago. x = x head v head nv. (15) As people always walk along a smooh pah n real scene, we can assume ha f he wo rackles belong o a same person, he correspondng predced posons mus be close o each oher. In oher words, x and x should be close enough o x head and x end respecvely. Therefore, he dsances beween predced posons and orgnal posons are used o represen he moon smlary beween wo rackles (seen n Fg. 5). So he moon smlary n he sngle camera s compued as below: P m (l l ) = exp( λ 2 ( x + x )) s = s. (16) As shown n Eq. 16, he relave dsance s only vald for wo rackles from he same camera. If rackles are from dfferen cameras, he nerval me s parly nvald. Becasue n ner-camera cases, he pahes beween cameras are hard o measure whch renders he nerval me useless for predcng

7 relave dsance. Wh s help, he moon smlary merc can be exend from a sngle camera o a mul-camera sysem and can be consdered as well equalzed n he global graph. The fnal equalzed moon smlary merc s: Pm (l l ) = exp( λ2 ( x + x )) + xmn )) exp( λ2 ( xmn f s = s f s = 6 s, (20) where λ s se o 0.01 n he expermens. (a) (b) (c) IV. E QUALIZED G RAPH M ODEL Fg. 7. Illusraon of he compung mehod for he mnmum relave dsance across cameras. In column (b), xal and xhead are n ex and ener areas respecvely, whch ndcae ha boh of xmn and xmn are mn mn se o 0. The red lnes n column (c) are x and x. posons. In hs case, he relave dsance mosly ends o be a huge wrong number. To handle hs problem, a mnmum relave dsance s appled o compue he smlary across cameras, whch s comparable wh Eq. 16. Ener/ex areas are commonly used n some uncalbraed camera sysems o help o re-local exac posons of arges. Hence, we labeled ener/ex areas of each camera vew wh he help of opology nformaon (seen Fg. 6). For a person, f she dsappeared from an ex area, we would assume ha she could be found n he ener area of he possble correspondng camera (seen n Fg. 7 (a)). If she dsappeared from an area near a ex area, she could re-appear n he possble correspondng ener area wh a hgh probably. Under hs assumpon, we manually se a dsappearng pon for each area o connec cameras. Then a mnmum relave dsance xmn o he dsappearng pon durng he whole nerval me s adoped o measure he moon smlary across cameras nsead of he orgnal relave dsance x, seen n Fg. 7 (b) and (c). xmn = xmn = mn nv [1, Pa (l l ) = σ(ds(l, l ) µ), µ 0, s = s, kxal +val xex s k2, f xal / Areaex, 0 ] mn [1,nv ] 0 f xal Areaex. (17) kxhead vhead xener k2 s f xhead / Areaener f xhead Areaener. (18) λ Pm (l l ) = exp( ( xmn + xmn )), 2 Durng rackng obecs n a sngle camera, we assume ha observaons are obaned under he same crcumsance, lke llumnaon and angle of vew. Hence he arges would have a srong nvarance n her appearance represenaons whch can furher be used for rackng. Durng ner-camera obec rackng, hs nvarance s weaker due o he changes n dfferen crcumsances. When we esablsh he graph wh nodes and edges, hs phenomenon would cause he nercamera smlares beng much lower han he smlares n sngle camera. If we use Eq. 12 o compue he appearance smlares and provde no algnmen or equalzaon for wo smlary dsrbuons, would resul n ha he opmzaon process lnks he edges n he sngle camera preferenally all he me and gnores he ner-camera lnks as long as here s a edge wh a hgher smlary n he same camera. I s hard o ge an accurae algnmen for wo smlary dsrbuons, and he proposed approach offers a suable algnmen whch can be consdered as a compensaon for he ner-camera smlares. Our purpose s o equalze he dfference beween wo smlary dsrbuons and a he same me manage o keep he dsrbuon of he ner-camera smlary no affeced. So our equalzaon s manly processed on he dsrbuon of he sngle camera smlary and make close o he nercamera smlary dsrbuon. (19) where xex and xener are he posons of he dsappearng s s pons for he ener area and he ex area n camera s respecvely. Anoher benef of he mnmum relave dsance s ha s measured n each camera whch can be compared wh he (21) where σ and µ are he compensaon facors, he smlary Ds(l, l ) beween rackles l and l s obaned by Eq. 12. The facor µ s used o mprove he average level of he sngle camera smlary dsrbuon and he facor µ s adoped o conrol he amplude of varaon. They are compued from wo smlary dsrbuons. µ = µ1 µ2, σ = σ2 /σ1, (22) where µ1 and σ1 are he mean and varance of he sngle camera smlary dsrbuon. These should be compued by all he sngle camera edges. And µ2 and σ2 are of he nercamera smlary dsrbuon and should be go from all he ner-camera edges. However, no all he smlares of edges are relable and suable o compue he mean and varance. Some have a large proporon of noses and should be excluded as oulers. In hs paper, a mnmum unceran gap (MUG) [18] s brough n o help o flrae edges used for compung he mean and

8 varance. The MUG s used o measure he unceranes of he lkelhoods beween rackles. The rackle lnk wh a small MUG can be consdered as a more relable lnk, because s smlary s more sable and more belevable. As a resul, he MUG s reaed as a confdence facor for edges. MUG(l, l ) = max Sm(H n, H m) mn Sm(H n, H m), n [1, d ], m [1, d ]. (23) Therefore, wh he help of MUG s flraon, he mean and varance are compued as follows: µ 1 = MEAN(Ds(l, l )) σ 1 = V AR(Ds(l, l )), MUG(l, l ) < ε, s = s. (24) µ 2 = MEAN(Ds(l, l )) σ 2 = V AR(Ds(l, l )), MUG(l, l ) < ε, s s, (25) where ε s a confdence hreshold, MEAN() and VAR() are he mean and varance operaons respecvely. And he fnal equalzed appearance smlary merc would become: P a (l l ) = { Ds(l, l ) f s s, σ(ds(l, l ) µ) f s = s. V. EXPERIMENT RESULTS (26) In hs secon, he proposed approach s evaluaed based on he followng aspecs. Frs, he global graph model s compared wh he radonal wo-sep framework, where we use he same feaure represenaon for farness. Second, a performance comparson beween he equalzed graph and he non-equalzed one s provded o prove he effecveness of he equalzaon process wh he mproved smlary merc. Thrd, he proposed approach s compared wh some saeof-he-ar Mul-Camera Trackng (MCT) mehods. However, as here re no benchmark for MCT, we nroduce a daase and a comprehensve evaluaon creron frs, whch can be developed as a benchmark n furher works. The daase s specalzed for mul-camera pedesran rackng n nonoverlappng cameras, called NLPR MCT daase. The deals of he daase are presened n Secon V-A. The proposed evaluaon crera for MCT s nroduced n Secon V-B. A. Daases For a comprehensve performance evaluaon, s crucal o develop a represenave daase. There are several daases for vsual rackng n he survellance scenaros, such as PETS [61], CAVIAR [62], TUD [63] and -LIDS [64] daabases. However, mos of hem are desgned for mulobec rackng n a sngle camera and are no suable for ner-camera obec rackng. PETS s under a smulaon envronmen wh overlappng cameras, no n real scene, whle -LIDS ams o serve mul-camera obec rackng ndoor and he ground ruhes are no for free so far. For hese reasons, a new pedesran daase s consruced n hs paper for mulcamera obec rackng o faclae he rackng evaluaon. The NLPR MCT daase 3 consss of four sub-daases. Each sub-daase ncludes 3-5 cameras wh non-overlappng scenes and has a dfferen suaon accordng o he number of people (rangng from 14 o 255) and he level of llumnaon changes and occlusons. The colleced vdeos conan boh real scenes and smulaon envronmens. We also ls he opologcal connecon marxes for pedesran walkng areas. All he vdeos are nearly 20 mnues (excep Daase 3) wh a rae of 25 fps and are recorded under non-overlappng vews durng daly me, whch make he daase a good represenaon of dfferen suaons n normal lfe. The connecon relaonshps beween scenes are shown n Fg. 8, where he ener/ex areas for hs paper are also marked. B. Evaluaon Crera As we know, boh SCT and ICT have her own evaluaon crera. Mos SCT rackers usually use he mul-obec rackng accuracy (MOTA) and ID swch [65] as her evaluaon crera, whle some SCT papers prefer oher erms [11], [24], [42]. In ICT, he ID swch s also a necessary erm. There are wo crera menoned n Secon I whch are mporan o a mul-camera mul-obec rackng sysem. The SCT module and he ICT module correspond o he wo crera respecvely. As hese wo crera are equally crucal for mulcamera obec rackng performance, hey should be consdered equally mporan n he fnal performance measuremen. Neverheless, n oday s mul-camera obec rackng, here s rarely a wdely acceped performance measuremen ha akes hese wo crera no accoun. The common creron researchers used for mul-camera obec rackng s an exenson of MOTA. I adds he ID swches n SCT and n ICT ogeher, whch gnores he dfferen ncdence denses of he ID swches n SCT and ICT. In mos vdeo scenes,. e. Table II, he ground ruhes used for frame machng n SCT are much more han hose n ICT. I leads o rackers carng more abou he raecores n sngle camera raher han he ner-camera machng. In hs paper, we rea hem separaely and provde a new evaluaon creron o measure he performance of mul-camera obec rackng. Our creron akes boh of SCT and ICT crera no accoun and unform hem no one evaluaon merc. The merc s called mulcamera obec rackng accuracy (MCTA): MCT A = Deecon T rackng SCT T rackng ICT 2 P recson Recall = ( P recson+recall )(1 mmes )(1 mmec ). ps pc (27) I s also modfed based on MOTA [65] and can be appled on mul-camera obec rackng. I avods he dsadvanage of MOTA ha can be negave due o he false posves. The MCTA ranges for 0 o 1. The merc conans hree pars: deecon ably, SCT ably and ICT ably, whch are correspondng o he hree brackes n Eq. 27. The P recson and Recall are negraed by F1-score o measure he deecon 3 hp://mc.deales.org/daases.hml

9 Fg. 8. Illusraon of he opologcal relaonshp durng rackng. The opologcal relaonshps for every daase are shown n he rgh column, and he blue polygons sand for ener/ex areas used n our expermens for Daase 1-4. TABLE II T HE SINGLE - CAMERA AND INTER - CAMERA GROUND TRUTHES FOR ALL FOUR SUB - DATASETS. Daase1 Daase2 Daase3 Daase4 T rackng SCT T rackng ICT T rackng SCT T rackng ICT T rackng SCT T rackng ICT T rackng SCT T rackng ICT 71853 334 88419 408 18187 152 42615 256 power and he occluson handlng ably. In hs paper, he expermens focus on esng he SCT and he ICT ables of he proposed approach, so for he frs wo expermens, we use he ground ruhes of obec deecons as he npus nsead of runnng a real deecor, whch leads o P recson = 1 and Recall = 1. In he las expermen, a DPM [59] deecor s used o ge he deecon resuls. Deecon = 2 P recson Recall, P recson+recall P f p P recson = 1 P r, P m Recall = 1 P g, (28) where f p, r, m and g are he number of false posves, hypohesses, msses and ground ruhes respecvely n me. P mmes SCT =1 P T rackng s, P p c (29) mme T rackng ICT = 1 P. pc For SCT and ICT ably pars, we measure he ables va he number of msmaches (ID-swches). We spl he number of msmaches mme n MOTA [65] no mmes and mmec. mmes represens he number of msmaches happened n a sngle camera and mmec s for hose ner-camera msmaches. The ps and pc are he machng numbers of frames n ground ruhes. ps conans he machngs, he wo frames of whch are from he same camera, and pc means he number of hose ner-camera machngs. I s worh nong ha boh ps and pc are among he ruh posve deecon resuls. For a new arge, s couned as an ner-camera ground ruh by defaul n our creron. C. Global Graph Model vs Two-Sep Framework The advanage of he proposed mehod s o mprove he ICT performance under an unperfec SCT resul. So n hs secon, he proposed global graph model s compared wh he radonal wo-sep framework,. e. a SCT approach plus an ICT approach. We use he same MAP model o solve he daa assocaon n boh SCT and ICT seps n he wo-sep framework and am o remove he nerference of dfferen daa assocaon mehods. Adopng he MAP model n SCT s presened n Zhang e al. [21]. However usng MAP model n ICT s no a suable soluon when he rackng resuls n sngle camera are perfec and unchangeable. Bu as we sad n Secon. II-B, when he SCT resuls are no deal, he daa assocaon n ICT should be more lke a global opmzaon problem raher han a K-pare graph machng problem, whch can be solved by he MAP model. Tha s anoher reason why we use he MAP model o acheve he daa assocaon n ICT n he radonal wo-sep framework. As a

10 Fg. 9. Performance evaluaon of he proposed approach under dfferen parameer sengs. The x-coordnae for all he fgures s he confdence hreshold θ of he AIF racker, and he number n bracke s he correspondng number of rackles. Wh he ncrease of θ, he rackle number grows and more rackle fragmens are produced. The y-coordnaes n hree rows are he SCT msmach number, he ICT msmach number and he MCTA score respecvely. The performance score under θ = 0 s shown n he legend. The mehod of global graph s he proposed approach. The wo-sep wh MAP s Zhang s work [21] whch uses MAP o acheve he SCT process. The wo-sep wh MAP and Hungary n he las wo row sand for he approaches ha solve he ICT problem wh MAP and Hungary algorhm [47]. TABLE III E MPIRICAL COMPARISON OF THE PROPOSED APPROACH ON FOUR MULTI - CAMERA TRACKING DATASETS. T HE BOLD INDICATES THE BEST PERFORMANCE. mme Daase1 Daase2 Daase3 Daase4 s NonA EqlA M EqlA+M 71 76 53 66 mmec 123 88 101 49 M CT A 0.6311 0.7357 0.6971 0.8525 mmes 83 109 67 93 201 164 126 107 M CT A 0.5069 0.5973 0.6907 0.7370 mmes 59 71 74 51 mme mme c c 132 116 95 80 M CT A 0.1312 0.2359 0.3735 0.4724 mmes 125 137 123 128 mme c M CT A AverageM CT A 187 169 188 159 0.2687 0.3388 0.2649 0.3778 0.3845 0.4769 0.5066 0.6099 complemen, we also ulze Hungary algorhm [47] o acheve he ICT sep, whch s a classcal daa assocaon mehod for ICT. The feaure represenaon n hs expermen s he PMCSHR appearance and moon feaures for all baselnes due o he farness reason. In hs expermen, he wang me hreshold η and he mnmum value ε of he MUG are se o 60*25*1 and 0.4 respecvely, he weghs of wo feaures k1 and k2 are boh 1. To prove he ably of he proposed approach handlng unperfec rackles n SCT, he expermen changes he hreshold θ of he confdence of he AIF racker o produce more fragmens arfcally. The hreshold θ ranges from 0 o 0.2 and he correspondng numbers of rackles are lsed besde he hreshold n Fg. 9. The oal sngle-camera machng number ps and nercamera machng number pc of ground ruhes for each subdaase are lsed n Table II. From he frs wo rows n Fg. 9, we can see ha wh he ncrease of he fragmened rackle number, boh he sngle camera msmach number mmes and he ner-camera msmach number mmec grow sgnfcanly n he proposed global graph and he wo-sep framework. In he frs row, he sngle camera msmach number mmes n he

11 TABLE IV PERFORMANCE COMPARISON USING THE GROUND TRUTHES OF SINGLE CAMERA OBJECT TRACKING AS INPUT. TABLE V PERFORMANCE COMPARISON USING THE GROUND TRUTHES OF OBJECT DETECTION AS INPUT. Daase1 Daase2 Daase3 Daase4 Ours USC-Vson Hfudspmc CRIPAC-MCT [32]+[41] [54] [19] mme c 55 27 86 113 MCT A 0.8353 0.9152 0.7425 0.6617 mme c 121 34 141 167 MCT A 0.7034 0.9132 0.6544 0.5907 mme c 39 70 40 44 MCT A 0.7417 0.5163 0.7368 0.7105 mme c 157 72 155 110 MCT A 0.3845 0.7052 0.3945 0.5703 AverageMCT A 0.6662 0.7625 0.6321 0.6333 proposed global graph s always larger han ha n he wosep framework [21], because he wo-sep framework offers an opmzaon n each camera whch makes have a beer local resul. In daase 3 and daase 4, he mme s n he proposed global graph becomes lower han ha n he wosep framework [21]. The reason s ha hese wo daases are under a smulaon condon whch have many frequen walkng around behavors. In hs case, he ner-camera nformaon may provde more useful feedbacks for each specfc camera and can parly mprove he SCT performance. For he ner-camera msmach number mme c n he mddle row, he number n he proposed global graph s much lower han ha n boh MAP and Hungary graph [47] n he wo-sep framework, ndcaes he effecveness of our global graph model o mprove he ICT performance. In daase 4, can be seen ha he mme c n he proposed graph s no smaller han ha n he wo-sep framework a frs me. However, wh he ncrease of fragmened rackles, he mme c n he proposed graph ncreases much more slowly and fnally becomes smaller han ha n he wo-sep framework. Wha s more, as he ICT sep n wo-sep framework, he daa assocaon mehod based on he global MAP s always beer han ha wh Hungary algorhm [47]. I can parly prove he assumpon ha he daa assocaon n ICT s more suable o be rea as a global opmzaon problem raher han a K-pare graph machng problem because of non-deal SCT resuls. In he las row, he MCTA of he global MAP always keep he hghes score, whch mples ha he proposed global graph model offers a beer performance compared wh he radonal wo-sep framework. D. Equalzed vs Non-equalzed Graph Model Ths expermen s conduced o prove he effecveness of he smlary equalzaon process. All he rackers are under our global graph model. We compare he equalzed appearance smlary merc wh he non-equalzed one and hen combned wh our equalzed moon merc. Parcularly, n hs expermen, he confdence hreshold θ of he AIF racker s fxed and se o 0. The resuls are shown on Table III. NonA and EqlA are he Daase1 Daase2 Daase3 Daase4 Ours USC-Vson Hfudspmc CRIPAC-MCT [32]+[41] [54] [19] mme s 66 63 77 135 mme c 49 35 84 103 MCT A 0.8525 0.8831 0.7477 0.6903 mme s 93 61 109 230 mme c 107 59 140 153 MCT A 0.7370 0.8397 0.6561 0.6234 mme s 51 93 105 147 mme c 80 111 121 139 MCT A 0.4724 0.2427 0.2028 0.0848 mme s 128 70 97 140 mme c 159 141 188 209 MCT A 0.3778 0.4357 0.2650 0.1830 AverageMCT A 0.6099 0.6003 0.4679 0.3954 resuls wh he non-equalzed and he equalzed appearance feaures. M s correspondng o he resuls wh he equalzed moon feaure only and EqlA+M s he one ha combnes he equalzed appearance feaure and he moon feaure ogeher. I can be found ha he resul wh he non-equalzed appearance smlary has a lower msmach number mme s n he sngle camera compared wh he equalzed one. I means ha when we conduc equalzaon, he sngle camera performance drops down due o he change of he dsrbuon of he sngle camera smlary, and ha s unavodable bu accepable. In he ner-camera rackng, s clear ha he equalzed appearance smlary racker gves a grea help o reduce he number mme c of msmaches across cameras. When he equalzed moon nformaon s added n, he mme c furher decreases. The MCTA s he fnal comprehensve score whch akes boh SCT and ICT performances no accoun. The larger he score s, he beer performance he racker has. As seen n Table III, he equalzed appearance smlary resul combned wh he equalzed moon nformaon has a hghes score. I ndcaes ha he ncreased sngle camera msmach number mme s n our mehod s accepable n order o reduce he ner-camera msmach number mme c and ge a hgher score n he whole MCT performance. Furher more, when we use he moon feaure alone for he mul-camera obec rackng, he performance s comparable and somemes beer han he appearance feaure, whch parly proves he effecveness of our equalzed moon smlary merc. E. Equalzed Global Graph Model vs Sae of The Ars In hs secon, we compare our equalzed global MAP graph model wh oher mul-camera obec rackng mehods. As a comparson, he mehods mus conan he ables o handle boh he SCT and he ICT seps. We compare he proposed graph wh curren wo-sep mul-camera obec rackng mehods. The mehods are from he Mul-Camera Obec Trackng (MCT) Challenge [54]. USC-Vson ( [32],

12 [41]) s he wnner n he challenge whch s consdered as he sae-of-he-ar wo-sep mul-camera obec rackng approach. We frs conduc he comparson under he condon ha he ground ruhes of sngle camera obec rackng are avalable, he resuls are shown n Table IV. I reflecs he ICT power of each mehod when he sngle camera obec rackng resuls are perfec. From he average MCTA score we can see ha USC-Vson ([32], [41]) s much beer han our proposed mehod. Ths proves he advanage of USC- Vson s ICT mehod. In Table V, only he ground ruhes of obec deecons are avalable, he racker should acheve he sngle camera obec rackng by hemselves. On hs occason, her resuls of he sngle camera obec rackng can be as perfec as he ground ruhes, and her ner-camera obec rackng algorhms have o bear hese fragmens and false posves. From Table V, alhough he SCT performance mme s of USC-Vson ([32], [41]) s beer han ours, s clear ha he number of s ICT msmaches ncreases much more shapely han our mehod s, whch ndcaes ha s powerful ICT mehod loses s advanage under he unperfec SCT resuls. Resuls are shown n Fg. 10. As he fnal evaluaon, our equalzed global graph model has he hghes average MCTA score, whch furher proves he advanage of our proposed model on mprovng he ICT performance under an unperfec SCT resul. A las, as perfec deecon can never be acheved n realy, we do anoher expermen whou he deecon ground ruhes. We uses he DPM deecor [59] o ge he deecon resuls. In Table VI he T rackng SCT and T rackng ICT correspondng o Eq. 29 are lsed nsead of mme because he dfferen deecon resuls would cause dfferen ps. From he resuls n Table VI, shows ha our resul s no he bes bu can be comparable wh he sae of he ars. Under a real deecor, here would be much mssng and false posve deecons. The ably of a mul-camera racker o handle hese mssng or false posve deecons manly comes from s SCT par. USC-Vson uses a herarchcal assocaon o buld s rackles, n whch he deecons are seleced dscreely and some mssng deecons can be parly complemened. In our mehod, a real-me sngle obec racker [60] s adoped o ge he rackles, whch can parly handle mssng deecons. Bu for he false deecons, once he racker drfs o a false deecon, would cause he whole rackle unrelable. Due o he benefs of he herarchcal assocaon n he SCT sep, USC-Vson has a more relable se of rackles han hose we have for he nex ICT sep. Even wh he help of he proposed equalzed global graph, our fnal resul s sll a lle lower han USC-Vson s. Ths can deny he effecveness of our equalzed global graph model, bu prove he advanage of USC-Vson s SCT mehod o handle msses and false posves. However, for praccal usages n real envronmen, he deecon-level assocaon s much slower han a real-me sngle camera racker. Tha s why we use he AIF racker o ge rackles n our mehod nsead of usng USC-Vson s deecon-based herarchcal assocaon. Some oher sngle obec rackers, such as TLD [66], may handle he false-deecon drfs by her onlne learnng mechansms. Bu coss oo much me and memores on learnng he onlne models, whch s hard o be appled on formng our TABLE VI PERFORMANCE COMPARISON WITHOUT THE GROUND TRUTHES OF OBJECT DETECTION. THE FINAL MCTA IS SHOWN AS BOLD FOR CLARITY. Daase1 Daase2 Daase3 Daase4 Ours USC-Vson Hfudspmc CRIPAC-MCT [32]+[41] [54] [19] precson 0.7967 0.6916 0.7113 0.1488 recall 0.5929 0.6061 0.3465 0.2154 T rackng SCT 0.9744 0.9981 0.9229 0.9955 T rackng ICT 0.6220 0.9288 0.6534 0.7111 MCT A 0.4120 0.5989 0.2810 0.1246 precson 0.7977 0.6948 0.7461 0.1431 recall 0.6332 0.7843 0.3669 0.1933 T rackng SCT 0.9779 0.9986 0.9347 0.9945 T rackng ICT 0.6942 0.8507 0.6122 0.7510 MCT A 0.4793 0.6260 0.2815 0.1075 precson 0.8207 0.4750 0.3342 0.0853 recall 0.5345 0.6615 0.0986 0.1206 T rackng SCT 0.9749 0.9904 0.9682 0.9715 T rackng ICT 0.2953 0.1014 0.2432 0.1143 MCT A 0.1864 0.0555 0.0359 0.0111 precson 0.8355 0.5216 0.7720 0.0606 recall 0.6193 0.79375 0.1210 0.0944 T rackng SCT 0.9275 0.9948 0.9865 0.9762 T rackng ICT 0.4308 0.5437 0.2944 0.2950 MCT A 0.2842 0.3404 0.0608 0.0213 AverageMCT A 0.3405 0.4052 0.1648 0.0661 raw rackles. As a resul, a real-me sngle camera racker ha can deal wh he false deecons s a promsng furher work for mul-camera obec rackng. VI. CONCLUSION In order o address he problem of mul-camera nonoverlappng vsual obec rackng, we develop a on approach ha opmsng he sngle camera obec rackng and he ner-camera obec rackng n one graph. Ths on approach overcomes he dsadvanages n he radonal wosep rackng approaches. In addon, he smlary mercs of boh appearance and moon feaures n he proposed global graph are equalzed. The equalzaon furher reduces he number of msmach errors n ner-camera obec rackng. The resuls show s effecveness for mul-camera obec rackng, especally when he SCT performance s no perfec. Our approach focuses on he graph modelng nsead of he feaure represenaon learnng. Any exsng re-denfcaon feaure represenaon mehod can be ncorporaed no our framework. REFERENCES [1] R. Vezzan, D. Baler, and R. Cucchara, People redenfcaon n survellance and forenscs: A survey, ACM Compu. Surv., vol. 46, no. 29, 2013. [2] A. W. M. Smeulders, D. M. Chu, R. Cucchara, S. Calderara, A. Dehghan, and M. Shah, Vsual rackng: An expermenal survey, IEEE Trans. Paern Anal. Mach. Inell. (PAMI), vol. 36, no. 7, pp. 1442 1468, 2014. [3] Y. Pang and H. Lng, Fndng he bes from he second bess c nhbng subecve bas n evaluaon of vsual rackng algorhms, n IEEE Inernaonal Conference on Compuer Vson (ICCV), 2013.

13 (a)daase 1: Oudoor scene (b)daase 2: Oudoor scene (c)daase 3: Indoor scene (d)daase 4: oudoor scene Fg. 10. Samples of msmaches n he ner-camera rackng. Four ner-camera rackng examples from Daase 1-4 are shown n (a)-(d). The frs row s he resuls of he proposed mehod, and he second row s from USC-Vson ([32], [41]). The red rec means he msmach happened across cameras. [4] Y. Wu, J. Lm, and M.-H. Yang, Onlne obec rackng: A benchmark, n IEEE Conference on Compuer Vson and Paern Recognon (CVPR), 2013. [5] K. Huang, L. Wang, T. Tan, and S. Maybank, A real-me deecng rackng dsan obecs sysem for ngh survellance, Paern Recognon, vol. 41, no. 1, pp. 432 444, 2008. [6] K. Huang and T. Tan, Vs-sar: a vsual nerpreaon sysem for vsual survellance, Paern Recognon Leers (PRL), pp. 2265 2285, 2010. [7] P. L. Veneaner and H. Deng, Performance evaluaon of an nellgen vdeo survellance sysem - a case sudy, Compuer Vson and Image Undersandng (CVIU), vol. 114, no. 11, pp. 1292 1302, 2010. [8] X. Wang, Inellgen mul-camera vdeo survellance: A revew, Paern Recognon Leers (PRL), vol. 34, pp. 3 19, January 2012. [9] J. Lu, P. Carr, R. T. Collns, and Y. Lu, Trackng spors players wh conex-condoned moon models, n IEEE Conference on Compuer Vson and Paern Recognon (CVPR), 2013. [10] M. D. Breensen, F. Rechln, B. Lebe, E. Koller-Meer, and L. J. V. Gool, Onlne mulperson rackng-by-deecon from a sngle, uncalbraed camera, IEEE Trans. Paern Anal. Mach. Inell. (PAMI), vol. 33, pp. 1820 1833, 2011. [11] C.-H. Kuo and R. Nevaa, How does person deny recognon help mul-person rackng? n IEEE Conference on Compuer Vson and Paern Recognon (CVPR), 2011, pp. 1217 1224. [12] R. Hamd, R. Kumar, M. Grundmann, K. Km, I. Essa, and J. Hodgns, Player localzaon usng mulple sac cameras for spors vsualzaon, n IEEE Conference on Compuer Vson and Paern Recognon (CVPR), 2010, pp. 731 738. [13] X. Chen, K. Huang, and T. Tan, Obec rackng across non-overlappng vews by learnng ner-camera ransfer models, Paern Recognon, vol. 47, pp. 1126 1137, March 2014. [14] Y. Ca, W. Chen, K. Huang, and T. Tan, Connuously rackng obecs [15] [16] [17] [18] [19] [20] [21] [22] [23] across mulple wdely separaed cameras, n Asan Conference on Compuer Vson. Sprnger, 2007, pp. 843 852. A. Segal and I. Red, Laen daa assocaon: Bayesan model selecon for mul-arge rackng, n IEEE Inernaonal Conference on Compuer Vson (ICCV), 2013, pp. 2904 2911. C. Arora and A. Globerson, Hgher order machng for conssen mulple arge rackng, n IEEE Inernaonal Conference on Compuer Vson (ICCV), 2013, pp. 177 184. A. Bu and R. Collns, Mul-arge rackng by lagrangan relaxaon o mn-cos nework flow, n IEEE Conference on Compuer Vson and Paern Recognon (CVPR), 2013, pp. 1846 1853. J. Kwon and K. Lee, Mnmum uncerany gap for robus vsual rackng, n IEEE Conference on Compuer Vson and Paern Recognon (CVPR). IEEE, 2013, pp. 2355 2362. W. Chen, L. Cao, X. Chen, and K. Huang, A novel soluon for mulcamera obec rackng, n IEEE Inernaonal Conference on Image Processng (ICIP), 2014, pp. 2329 2333. M. Pccard and E. Cheng, Mul-frame movng obec rack machng based on an ncremenal maor color specrum hsogram machng algorhm, n IEEE Conference on Compuer Vson and Paern Recognon Workshops, 2005, p. 19. L. Zhang, Y. L, and R. Nevaa, Global daa assocaon for mulobec rackng usng nework flows, n IEEE Conference on Compuer Vson and Paern Recognon (CVPR), 2008, pp. 1 8. X. Chen, Z. Qn, L. An, and B. Bhanu, An onlne learned elemenary groupng model for mul-arge rackng, n IEEE Conference on Compuer Vson and Paern Recognon (CVPR), 2014, pp. 1242 1249. A. Zamr, A. Dehghan, and M. Shah, Gmcp-racker: Global mulobec rackng usng generalzed mnmum clque graphs, n European Conference on Compuer Vson (ECCV), 2012, pp. 343 356.