Recently my team has finished a paper about Anomaly Detection. We proposed a novel unsupervised Anomaly Detection model (LAG) based on LDA, Autoencoder, and GMM. Our model can be used on both structured and unstructured data and provides a comprehensive solution for various Anomaly Detection tasks in different industries. Particularly, we provide a way to perform our model on financial transactions. Our model outperforms state-of-the-art anomaly detection models with more than 8% F1 score improvement on the public benchmark datasets.
The innovation of our work includes the following aspects: Firstly, we propose a way to conduct tokenization for the transaction data, which can convert a transaction to a word and a batch of transactions to a document that represents the financial behavior of a customer. Secondly, we provide a way to deal with the unstructured data by exploiting LDA, which can transform text data or any discrete data into a low-dimensional space and the low-dimensional topic vector generated by LDA will be very helpful in the downstream tasks. Thirdly, we combine the LDA, Autoencoder, and GMM as an entire model to perform anomaly detection.
LAG_pwc