Hybrid CAE-VAE for unsupervised anomaly detection in log file systems

A Wadekar, T Gupta, R Vijan… - 2019 10th International …, 2019 - ieeexplore.ieee.org
A Wadekar, T Gupta, R Vijan, F Kazi
2019 10th International Conference on Computing, Communication and …, 2019ieeexplore.ieee.org
Anomaly detection is of paramount importance especially in big data systems since these
systems log abruptly changing events which generate consequential outliers in their logs.
These logs are highly unstructured in nature, hence traditional machine learning methods
fail to detect anomalies. Prominent approaches include supervised techniques which
require labelled data for their operation and unsupervised techniques that rely on some
error metric. Also supervised methods can only capture anomalies present in the dataset …
Anomaly detection is of paramount importance especially in big data systems since these systems log abruptly changing events which generate consequential outliers in their logs. These logs are highly unstructured in nature, hence traditional machine learning methods fail to detect anomalies. Prominent approaches include supervised techniques which require labelled data for their operation and unsupervised techniques that rely on some error metric. Also supervised methods can only capture anomalies present in the dataset, such an approach fails for any new type of anomaly. Hence, the need for unsupervised learning techniques with an easy to interpret anomaly score arises. In this paper, we propose a solution utilizing a hybrid Convolutional Autoencoder-Variational Autoencoder (CAE-VAE) architecture for discrete event sequences which are obtained by processing log files using log keys derived from individual entries. We evaluate our model on Hadoop Distributed File System (HDFS) logs. Unlike most traditional Autoencoder approaches utilizing reconstruction error for anomaly detection, our proposed model derives a likelihood metric which can be interpreted as an anomaly score. We also present a comparative analysis of our models with a supervised CNN model and an unsupervised CAE model and prove empirically how our model gets better results.
ieeexplore.ieee.org
顯示最佳搜尋結果。 查看所有結果