My new paper: Anomaly Detection based on LDA, Autoencoder and GMM

Recently my team has finished a paper about Anomaly Detection. We proposed a novel unsupervised Anomaly Detection model (LAG) based on LDA, Autoencoder, and GMM. Our model can be used on both structured and unstructured data and provides a comprehensive solution for various Anomaly Detection tasks in different industries. Particularly, we provide a way to perform our model on financial transactions. Our model outperforms state-of-the-art anomaly detection models with more than 8% F1 score improvement on the public benchmark datasets.

The innovation of our work includes the following aspects: Firstly, we propose a way to conduct tokenization for the transaction data, which can convert a transaction to a word and a batch of transactions to a document that represents the financial behavior of a customer. Secondly, we provide a way to deal with the unstructured data by exploiting LDA, which can transform text data or any discrete data into a low-dimensional space and the low-dimensional topic vector generated by LDA will be very helpful in the downstream tasks. Thirdly, we combine the LDA, Autoencoder, and GMM as an entire model to perform anomaly detection. 

LAG_pwc

 

Published by frank xu

I am a data science practitioner. I love math, artificial intelligence and big data. I am looking forward to sharing experience with all data science enthusiasts.

Leave a Reply

%d bloggers like this: