Statistical Filters for Crowd Image Analysis

Similar documents
Mixture Models and EM

A Background Layer Model for Object Tracking through Occlusion

Global Behaviour Inference using Probabilistic Latent Semantic Analysis

Shape of Gaussians as Feature Descriptors

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Human Pose Tracking I: Basics. David Fleet University of Toronto

Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Graphical Object Models for Detection and Tracking

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling

Data Analyzing and Daily Activity Learning with Hidden Markov Model

Hidden CRFs for Human Activity Classification from RGBD Data

Supplementary Material: Minimum Delay Moving Object Detection

Change Detection in Optical Aerial Images by a Multi-Layer Conditional Mixed Markov Model

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

Latent Variable Models and Expectation Maximization

Modeling Multiscale Differential Pixel Statistics

Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images

A RAIN PIXEL RESTORATION ALGORITHM FOR VIDEOS WITH DYNAMIC SCENES

Machine Learning Techniques for Computer Vision

Overlapping Astronomical Sources: Utilizing Spectral Information

Hidden Markov Models Part 1: Introduction

Hidden Markov models for time series of counts with excess zeros

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Latent Variable Models and Expectation Maximization

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide

Predicting Social Interactions for Visual Tracking

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

Note Set 5: Hidden Markov Models

Linear Dynamical Systems

Hidden Markov Models and Gaussian Mixture Models

Forecasting Wind Ramps

ABSTRACT INTRODUCTION

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Spectral Clustering with Eigenvector Selection

Spectral clustering with eigenvector selection

Event Detection by Eigenvector Decomposition Using Object and Frame Features

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Expectation Maximization

MIXTURE MODELS AND EM

RESTORATION OF VIDEO BY REMOVING RAIN

A Benchmark for Background Subtraction Algorithms in Monocular Vision: a Comparative Study

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

Maximum Likelihood and Maximum A Posteriori Adaptation for Distributed Speaker Recognition Systems

STA 4273H: Statistical Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Anomaly Detection for the CERN Large Hadron Collider injection magnets

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Optical Flow, Motion Segmentation, Feature Tracking

Recent Advances in Bayesian Inference Techniques

CSC487/2503: Foundations of Computer Vision. Visual Tracking. David Fleet

Automatic estimation of crowd size and target detection using Image processing

An Evolutionary Programming Based Algorithm for HMM training

Clustering with k-means and Gaussian mixture distributions

A Novel Activity Detection Method

HMM part 1. Dr Philip Jackson

Multiscale Systems Engineering Research Group

Tracking Human Heads Based on Interaction between Hypotheses with Certainty

Patch similarity under non Gaussian noise

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Defect Detection Using Hidden Markov Random Fields

Online Appearance Model Learning for Video-Based Face Recognition

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

PATTERN RECOGNITION AND MACHINE LEARNING

Video Behaviour Profiling for Anomaly Detection

Collaborative topic models: motivations cont

A CUSUM approach for online change-point detection on curve sequences

Introduction to Machine Learning Midterm Exam

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Sound Recognition in Mixtures

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

PROBABILISTIC REASONING OVER TIME

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Multi-Observations Newscast EM for Distributed Multi-Camera Tracking

Figure : Learning the dynamics of juggling. Three motion classes, emerging from dynamical learning, turn out to correspond accurately to ballistic mot

Tennis player segmentation for semantic behavior analysis

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Independent Component Analysis and Unsupervised Learning

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Robust Sound Event Detection in Continuous Audio Environments

Variational Methods in Bayesian Deconvolution

DISTRIBUTION A: Distribution approved for public release.

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

MULTI-TARGET TRACKING USING ON-LINE VITERBI OPTIMISATION AND STOCHASTIC MODELLING

Estimating Gaussian Mixture Densities with EM A Tutorial

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Bayesian Networks BY: MOHAMAD ALSABBAGH

10/17/04. Today s Main Points

Gaussian Process Based Image Segmentation and Object Detection in Pathology Slides

Motion estimation. Digital Visual Effects Yung-Yu Chuang. with slides by Michael Black and P. Anandan

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transcription:

Statistical Filters for Crowd Image Analysis Ákos Utasi, Ákos Kiss and Tamás Szirányi Distributed Events Analysis Research Group, Computer and Automation Research Institute H-1111 Budapest, Kende utca 13-17, Hungary {utasi, akos.kiss, sziranyi}@sztaki.hu Abstract A mass of people can behave like as a random moving swarm. Its complexity can be described by statistical features. The paper gives solutions for recognising unusual motion patterns from overall motion statistics. The resulting system is tested on the PETS2009 dataset scenarios S3: Event Recognition and Dataset S1: Person Count and Density Estimation, with convincing results. 1. Introduction While there is a wide range of approaches many of them can not be applied in outdoor surveillance due to unreliable observation data. Surveillance applications face a lot of problems and as discussed in several papers (e.g. [1]) there is a significant gap between laboratory testing and real life applications where there are several sources of noise. This paper describes a statistical based evaluation of special motion events of mass of people tested on the PETS outdoor video dataset. Special events, as walking, running, rapid dispersion, local dispersion, crowd formation and splitting are estimated from statistics over global and local probabilistic models. We give a meaningful solution for unifying models of global motion statistics and local spatiotemporal flow estimation. A 4 dimensional Mixture of Gaussian model (MOG) is used to characterise the usual motion patterns depending on the location in the pedestrian s area. A likelihood function gives the probability of a flow-map, together with the global motion probability. The paper avoids the ambiguous definition of object shape or connectivity here, as they are abruptly change by time in any position. We show that the relatively simple statistical models may support to give adequate answers for the given questions. 2. Our approach Our event recognition method is based on low-level motion statistics. We created several low-level detectors to model different properties of the dense optical flow vector field. Our method performs the following steps: Preprocessing: background-foreground separation, optical flow calculation and filtering; Low-level detectors: produce properties for a given observation (property of the dense optical flow vector field); Event recognition: uses the output of the low-level detectors to categorise the event and provides membership probability; State (event) determination: event category with the highest membership probability is selected. 3. Preprocessing 3.1. Background-foreground separation For background modelling we used the CIE L*U*V* uniform colour space. We trained MOGs for each pixel using EM [2]. For background-foreground separation we used the method proposed by [3], but we omitted the update procedure. Moreover the variances of the L* were increased to handle lightning condition changes more effectively. Match in pixel (i, j) is defined as yi,j c µc i,j < T (1) c Σ c i,j where yi,j c is the value of the pixel in (i, j) in channel c {L, U, V }, µ and Σ denotes the expected value and covariance respectively and I = 6.0 is a constant [3]. The first B backgrounds are chosen as the background model: B = arg min b ( b ) w l > T l=1 where in our case the T parameter was set to 1.0 to select all Gaussians in the model. Finally morphological operators were use to clean the noisy foreground mask. Further improvements can be achieved by removing the shadows from the foreground mask (e.g. using the technique of [4]). (2)

3.2. Optical flow calculation, filtering We used the method of [5] to calculate the optical flow, which was smoothed by a spatial median filter of radius r = 1. The optical flow vectors were transformed to polar coordinates followed by several simple filters: unusually small and large magnitudes were dropped; vectors outside the foreground mask were dropped. 4. Low-level detectors In this section we present our detectors we used to model different features of the optical flow field. The models of the detectors are trained on the PETS Regular flow training data. 4.1. Detecting unusual optical flow Mixture of Gaussians were successfully used in several previous work for motion segmentation [7, 9, 8]. Having the training set of regular flows we extracted the optical flow vectors and trained [2] a 4 dimensional (x, y, vx, vy) MOG model (location + velocities) with 64 components in the mixture. The model learns the location, the speed and the direction of the regular activity from the training set. Fig. 1 represents the means of the Gaussians of the mixture: solid red line represents the mean direction and magnitude in the mean location, while the radius of the ellipses are proportional to the location variances. be expressed as P (O) = K K P MOG (o k ) (3) k=1 where P MOG (o) = M l=1 w ln (o µ l, Σ l ). During detection the optical flow vectors are collected from the video frames in a time window of size W = 5. 4.2. Detecting unusual magnitudes In order to describe the typical magnitudes of the usual activity a 1 dimensional Gaussian model was estimated from the training dataset. Before training the model the optical flow vectors were transformed into 3D space in order to normalise the magnitudes. For calculating the probability of a set of magnitudes we used the formula of Eq. 3 with the Gaussian probability in the product. 5. Event categorisation Our method currently recognises three types of events: regular activity, running and splitting. The recognition can be easily extended by using other low-level feature detector plugins. 5.1. Event recognition Using the low-level feature detectors presented in Sec. 4 we calculated the mean probabilities (or log P r) of the training data. Let denote the P i the mean probability of the ith low-level detector, and D i the standard deviation. Then to express that a given low-level feature f i with p i is similar to the training dataset we can define ( M sim (p i ) = N R pi P ) i, D i (4) membership similarity measure, where N R is a righttruncated Gaussian. Similarly we can define the membership dissimilarity measure as ( M dissim (p i ) = N L pi P ) i K D i, D i (5) Figure 1: Mixture of Gaussian (MOG) ellipses in 4 dimensions: x, y, vx, vy, represented by the cut at 2.5σ, while the velocities are represented by small red vectors in the centre points. Then for an incoming optical flow field O = {o 1,..., o K } (o k = (x k, y k, vx k, vy k )) the probability can where N L denotes the left-truncated Gaussian, the values of K is typically 2.5 4.0 (in our case it was set to 3.5). Fig. 2 demonstrates the two functions. Our event categorisation algorithm is based on the above membership functions. Regular activity for example is constructed from two similarity functions on the two low-level feature detectors (unusual event and unusual magnitude) presented in Sec. 4. We defined the following three recognisers: Regular activity (R reg ): usual flow (M sim ) and usual magnitudes (M sim ), see Fig. 3 top;

Figure 2: Membership similarity (black) and dissimilarity (red) functions. Running (R run ): unusual flow (M dissim ) and unusual magnitudes (M dissim ), see Fig. 3 middle; Split (R split ): unusual flow (M dissim ) and usual magnitudes (M sim ), see Fig. 3 bottom. Each recogniser calculates the product of its membership similarity and dissimilarity values. Finally the most probable (highest value) case is selected to define the output state. 6. Experiments We tested our event recognition system on the PETS Event recognition dataset and we selected the videos containing the running and splitting events. The false alarm ratio was extremely low, the confusion matrix is shown below. Event Regular Run Split Regular 160 1 0 Run 1 115 0 Split 24 0 45 Please note that the end of the split event cannot be clearly defined hence the performance might increase. The detected state sequences are demonstrated on Fig. 4. 7. Person Count and Density Estimation For person count and density estimation we manually trained an estimator from the training sequences (S0 Regular flow) similar to [10]. Each image frame in the training set of M frames was segmented into N = 40 regions (40 equal columns) and for each region we collected the number of foreground pixels (Sec. 3.1) resulting in an N M matrix denoted as F. Moreover let denote p = [p 1,..., p M ] T the number of pedestrians for the training set. Using the ground truth data of F and p we can estimate the probable number Figure 4: Most probable state sequences. Top: S3.L3 Sequence 1 (running) with timestamp 14-16; Bottom: S3.L3 Sequence 3 (split) with timestamp 14-31; States: 0 - regular, 1 - running, 2 - split of pedestrians per foreground pixels ratio for each regions denoted as r and is computed as the solution of F r = p. (6) For an unknown image i frame we collect the foreground pixels in each region as the feature vector x i then the number of pedestrians is estimated as p i = x i r. (7) Please note that only a subset of regions hold usable information, the others might be skipped or components might be computed from the nearby important regions. We used 800 frames from the S0 Regular flow training sequences to train our estimator and the remaining 421 frames were used for testing. The result is demonstrated on Fig. 5. The algorithm is fast, but occlusion highly reduces its reliability. This is the reason of the deviation of cc. 5-6 detected persons in Fig. 5, which leads to the error diagrams in Fig. 6. The explanation of high relative error values is that the band has approximately the same width, indicating that occlusion means rather additive than multiplicative noise using this model. 8. Summary and Conclusions In this paper we presented a probabilistic event classification system. The design of the proposed system allows us to easily integrate new low-level detector plugins to recognise other complex event classes. In the future we plan to use probabilistic models which take into account the duration of the events (e.g. hidden semi-markov model [6]). Moreover our person count estimator can be improved by including temporal information in the model (e.g. using temporal filters).

guessed number of people 45 40 35 30 25 20 15 10 5 Guesses grouped by ground truth ground truth train data test data 0 0 5 10 15 20 25 30 35 40 number of people (ground truth) [8] Wei Zhang, Xiangzhong Fang, and Xiaokang Yang Spatiotemporal Gaussian mixture model to detect moving objects in dynamic scenes, J. Electron. Imaging, Vol. 16, 2007. [9] Roland Wilson, Andrew Calway, Multiresolution Gaussian Mixture Models for Visual Motion Estimation, in Proc. of the IEEE International Conference on Image Processing, pp. 921-924. Oct. 2001. [10] Yin, J., Velastin, S., Davies, A., Image Processing Techniques for Crowd Density Estimation Using a Reference Image, in Proc. 2nd Asia-Pacific Conference on Computer. Vision, pp. 610, 1995 Figure 5: S1: Guessed number of people grouped by the S0 ground truth. Acknowledgements This work has been supported by the Hungarian Research Fund OTKA 76159 and the European Defense Agency in the MEDUSA project. References [1] Anthony R. Dick and Michael J. Brooks, Issues in Automated Visual Surveillance, Proc. 7th International Conference on Digital Image Computing: Techniques and Applications, pp. 195 204. Sydney, 2003. [2] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Royal Stat. Soc., vol. 39, pp. 1 38, 1977. [3] C. Stauffer and W. E. L. Grimson, Adaptive Background Mixture Models for Real-time Tracking, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246 252, Fort Collins, CO, USA, 23-25 June 1999. [4] Cs. Benedek and T. Szirányi Bayesian Foreground and Shadow Detection in Uncertain Frame Rate Surveillance Videos, IEEE Transactions on Image Processing, 17:(4) pp. 608-621, 2008 [5] J. R. Bergen, R. Hingorani, Hierarchical Motion-Based Frame Rate Conversion, Technical report, David Sarnoff Research Center Princeton NJ 08540, 1990. [6] J. Ferguson, Variable duration models for speech, In Proceedings of the Symposium on the Application of HMMs to Text and Speech, pages 143-179, 1980. [7] Weiss, Y. Adelson, E.H. A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models, in Proc. Computer Vision and Pattern Recognition, 1996. pp.321-326

Figure 3: Probability values of the low-level detectors for the regular activity (top), running event (middle) and split event (bottom). Left column: output of unusual flow detector, right column: output of unusual magnitude detector. Logarithmic scale is used.

Figure 6: Histograms of error (left) and relative error (right) on the training (top) and test (bottom) data.