Spotting Financial Irregularities with Project Fusion
A Data Reply article authored by Senior Data Scientist, Rishan Shah

Spotting Financial Irregularities with Project Fusion

In the realm of Financial Services (FSI), when it comes to money and transactions, we must have appropriate due diligence and discernible transparency in how we monitor them. This is to avoid non-remedial and unexpected outcomes. To achieve this the industry has become heavily regulated and developed consensus on tools and technologies that are used to achieve that. After all, if it works there is no reason to change it. However, the term “works” should be scrutinised with a cautious recognition of potential drawbacks, since an unfortunate consequence of the regulations is that we are inundated with vast quantities of false positives. In the past, to combat this situation the FSI made use of manual procedures to sift through transactions to identify those which are most likely to be suspicious. However, as the world of financial transactions grows increasingly complex and intertwined, the tools that once seemed efficient are starting to show signs of strain.

With some degrees of freedom in the use of post-alert data, financial institutions have made use of Machine Learning and Artificial Intelligence to alleviate the burden. Nevertheless, the challenge to distinguish between genuine and false alerts remains ever-present if not more pressing. Enter the concept of Supervised Clustering using SHAP values. Although quite experimental and novel, this method aims to bridge the gap between rule-based models and Artificial Intelligence to retain explainability and algorithmically utilise pre-existing knowledge to categorise false and true alerts with greater efficacy. As financial institutions search for ways to improve their transaction monitoring systems, adopting approaches like Supervised Clustering may well be the path forward, providing a balance between regulatory compliance, operational efficiency, and effective fraud detection.


Understanding Supervised Clustering Using SHAP values

Rather than classical clustering techniques, supervised clustering works using a multistep model which considers prior information and features that may be related to the classification of data points that we engineer ourselves. In our case, we could consider extracting information on how anomalous a data point is and using this to establish SHAP values. We can do this using a technique called “Isolation Forest”, an unsupervised algorithm, which provides an “Isolation Score” that can be a quantifiable measure of how anomalous a specific data point is. Using this score as a target, we now have a regression problem that can be solved by any of the traditional machine learning algorithms. However, since our data is quite imbalanced the better option would be to go for decision trees given their resilience to such data.

The next step is to boost the predictive capacity it would make sense at this stage to resample the data. We make use of the synthetic minority oversampling technique (SMOTE) to do this and upsample the truly fraudulent minority class.

Once we fit a regressive decision tree to the data, we are now able to obtain SHAP values passing this model through for an explainability calculation. The novelty of this entire method is that the above-described steps are all pre-processing steps, and we need not split them into training and testing data as we would with traditional Machine Learning models. We now use the SHAP values directly in the clustering using a K-Nearest Neighbours algorithm.

Traditional Vs Supervised Clustering using SHAP

The above figure shows us the potential benefits of using SHAP values enabling us to reduce the noise generated in the data. Furthermore, in contrast to our experiments, the dimensionality reduction can help visualise that this procedure tightly compacts the clusters given prior information making the downstream clustering easier. However, this dimensionality reduction is not necessary, and the information can be retained in its original form still giving an appropriate result. In our experiments, we found that with alerting data we were able to discern between true and false alerts with a weighted F1 score of 86% showing great potential for this method.


Our Experience – Working with a large multinational banking institution

Data Reply worked with a multinational bank to improve their fraud detection service. In this instance, they were having to overcommit resources to this work and were struggling with a large backlog. Not only through our extensive knowledge of data science and statistics were we able to contribute to the rules to initially identify fraudulent cases, but we also successfully implemented the Cluster model as described above.

This helped automate review structures to reduce operation costs while maintaining customer retention through improved trust in the system. AML is the primary use case for this technology, but we also discussed other instances where this technology could be used. For example, as a by-product, the clustering model can also be used to offer specific products to individuals lying in the same cluster. Further, rather than focussing on individual transactions, we were also able to use this technology to provide overall risk profiles for individual customers.


Future and Conclusion

Fraud Detection will remain a problem to be solved as long as people commit fraud. An arms race has developed over recent decades where due to technological advancements, both consumers and institutions have totally changed the way they deal with this problem, both from a defensive and offensive standpoint. Our role as data scientists is to remain up to date with all of this, through research and adoption of new techniques and old. In this instance, we make use of this model both due to its increased F1 score relative to older models such as a plain XGBoost, as well as improving explainability using SHAP values.

Data Reply aims to be at the frontier, contributing to impactful solutions in the financial industry, but also wherever else this technology may come into use. We keep ourselves knowledgeable of the latest advances in this, and many other areas. At the same time, we experiment using these new models and will continue this tradition of expertise and innovation for years to come.


Is your business seeking advanced solutions for more precise fraud detection and AML compliance?

At Data Reply, our experts are adept at implementing Supervised Clustering with SHAP values, significantly enhancing the accuracy of transaction monitoring systems. Step into a new era of financial intelligence with our leading-edge AI applications.

Reach out to us at info.data.uk@reply.com or directly connect with our Data Science Manager, Perumal S K.


References


Rishan Shah, Senior Data Scientist, Data Reply UK


To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics