Trong-Nguyen Nguyen, Sébastien Roy and Jean Meunier
2021 IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Anomaly detection is a key functionality in various vision systems such as surveillance and security. In this work, we present a convolutional neural network (CNN) that supports the detection of anomaly, which has not been defined when building the model, at frame level. Our CNN, named SmithNet, is structured to simultaneously learn commonly occurring textures and their corresponding motion. Its architecture is a combination of (1) an encoder extracting motion-texture coherence from each video frame and (2) two decoders that separately reconstruct the input as well as predict its typical motion from the estimated coherence. We also introduce an encoding block which is specifically designed for the task of anomaly detection. The optimization is performed on only data of normal events, and the network is expected to determine the ones that are unusual, i.e. have not been seen before. According to the experiments on 8 benchmark datasets of different environments with various anomalous events, the performance of our network is competitive or outperforms current state-of-the-art approaches.