Automatically detecting violence in videos is paramount for enforcing the law and providing the society with better policies for safer public places. In addition, it may be essential for protecting minors from accessing inappropriate contents on-line, and for helping parents choose suitable movie titles for their children. However, this is an open problem as the very definition of violence is subjective and may vary from one society to another. Detecting such nuances from video footages with no human supervision is very challenging.
In this paper the authors explores a fast end-to-end Bag-of-VisualWords (BoVW)-based framework for violence classification. They adapt Temporal Robust Features (TRoF), a fast spatio-temporal interest point detector and descriptor, which is custom-tailored for inappropriate content detection, such as violence. The used method holds promise for fast and effective classification of other recognition tasks (e.g., pornography and other inappropriate material). When compared to more complex counterparts for violence detection, the method shows similar classification quality while being several times more efficient in terms of runtime and memory footprint.
The explored three-layered BoVW-based framework for video violence classification
MOREIRA, D. H. ; AVILA, SANDRA ; PEREZ, MAURICIO ; MORAES, Daniel ; TESTONI, Vanessa ; VALLE, Eduardo ; Siome Goldenstein ; ROCHA, ANDERSON. Temporal Robust Features for Violence Detection. In: IEEE Intl. Winter Conference on Applications of Computer Vision (WACV), 2017, Santa Rosa. IEEE Intl. Winter Conference on Applications of Computer Vision (WACV), 2017. p. 1-9.