Crowd Event Detection and Analysis through Video Surveillance

Introduction

The goal of this project is to automatically process video streams to estimate crowd density and detect abnormal events in a crowd to infer whether there is a threat that should be signaled to a human operator.

Crowd density estimation

Crowd density is an important feature of a crowd and thus it is essential to think that different levels of crowd density should receive different kinds of attention. Polus et al. [1] define the problem of level of services for a pedestrian flow as: free flow, restricted flow, dense flow and jammed flow based on the number of pedestrians per unit area.

The literature on crowd density estimation presents two different methods: counting by detection and counting by regression. In counting by detection [2][3], a pedestrian in the scene is individually detected using an object detector, and then the detected objects are tracked. Finally, the number of people is counted by the total number of tracks.

The mainstream works in recent years focus on solving the counting problem based on the counting-by-regression framework [4][5]. This kind of method aims to learn a direct mapping from global or local low-level features to the number of objects by means of supervised machine learning algorithms such as Support Vector Regression (SVR), Gaussian Process Regression (GPR) or Bayesian Poison Regression. An alternative way is to use pixel-wise density learning and the number in the ROI is computed by integrating over the crowd density map.

Although the regression methods have achieved great performance in the recent years, the features set or the kernel function is usually dependent to the training size, and the overfitting problem occurs when the dimension of the features is high. Thus, learning a regression function is still a challenging problem.

In order to avoid the aforementioned problem in the crowd density estimation, we attempt to solve this problem based on the image retrieval methods. The basic idea is that for any given test image or frame, we could find several closed images in the training set and compute people number as the average number when a large dataset of pedestrians are annotated. In specific, we first segment the local crowd region for each frame, and for each region we are able to use the local low-level features such as motion or optical flows. Then we use these features as the basis, and apply the sparse representation to train a well-fit dictionary with K-SVD method. In the test process, for each given frame or image, we find the closed images in the training with the minimum reconstruction cost. The final people number is the average number annotated for these images we retrieval from the training set.

Abnormal crowd event detection

Anomaly detection also refers to outlier detection that is to identify the patterns in a given data set which do not conform to an established normal behavior in the crowd.

Previous work in abnormal video event detection can be categorised into two sets:

Local abnormal event (LAE): the behavior of an individual is different from the neighbors.
Global abnormal event (GAE): the group behavior of is usual. In order to represent the abnormal event, binary features from background model are adopted in [6][7]. Other methods consider the spatial-temporal information including Histogram of Optical Flow (HOF), spatial-temporal gradient, social force model, chaotic invariant, mixtures of dynamic textur4es. There are also saliency feature and graph-based non-liner dimensionality reduction method.

For anomaly measurement, the mainstream algorithms intend to compare testing sample with the training event based on a probability model [8][9]. There are variety of statistics models including Gaussian model, Gaussian Mixture Model, Hidden Markov Model, Markov Random Filed or spatio-temporal MRF and Latent Dirichlet Allocation. For these conventional models, high-dimensional feature is preferred to better represent the event while the required number of training data is increasing exponentially with the feature dimension, which is unrealistic to have enough training data for estimation in practice. Thus, the main unsolved problem by most state-of-the-art methods is how to represent an event using high-dimensional feature.

In our work, we propose to use sparse representation to represent high-dimensional samples with less training data, which inspire us to detect abnormal event through a sparse reconstruction from normal ones. Our work will mainly focus on how to select the optimal basis and choose a dictionary that is fitted the training set well. Finally, we aim to detect abnormal events through a sparse reconstruction over the normal bases.

Conclusion

This project aims to accurately measure the crowd density and also to detect the abnormal event in the crowd and hope to improve the state-of-the-art performance.

References

Polus, A., Schofer, J. L., & Ushpiz, A. (1983). Pedestrian flow and level of service. Journal of Transportation Engineering, 109(1), 46-56.
Li, M., Zhang, Z., Huang, K., & Tan, T. (2008, December). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on (pp. 1-4). IEEE.
Zhao, T., Nevatia, R., & Wu, B. (2008). Segmentation and tracking of multiple humans in crowded environments. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(7), 1198-1211.
Chan, A. B., & Vasconcelos, N. (2012). Counting people with low-level features and Bayesian regression. Image Processing, IEEE Transactions on, 21(4), 2160-2177.
Kong, D., Gray, D., & Tao, H. (2005, September). Counting Pedestrians in Crowds Using Viewpoint Invariant Training. In BMVC.
Benezeth, Y., Jodoin, P. M., Saligrama, V., & Rosenberger, C. (2009, June). Abnormal events detection based on spatio-temporal co-occurences. InComputer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 2458-2465). IEEE.
Zhong, H., Shi, J., & Visontai, M. (2004, June). Detecting unusual activity in video. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on (Vol. 2, pp. II-819). IEEE.
Adam, Amit, et al. "Robust real-time unusual event detection using multiple fixed-location monitors." Pattern Analysis and Machine Intelligence, IEEE Transactions on 30.3 (2008): 555-560.
Kratz, L., & Nishino, K. (2009, June). Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 1446-1453). IEEE.

Research Highlights