Federated Fraud Detection

Privacy preserving fraud detection using TGNNs in a HFL environment

Credit card fraud is a persistent challenge for financial institutions, requiring sophisticated detection methods capable of analyzing large volumes of transactional data in real-time. In fact, an estimated 52 million Americans had fraudulent charges on their credit or debit cards in 2023, totaling over $5 billion in unauthorized purchases [1].

Predicting and preventing credit card fraud is challenging because:

  1. Banks are typically prohibited from sharing their transaction statistics due to data security and privacy regulations.
  2. Credit card transaction datasets are extremely biased, with far fewer examples of fraudulent purchases compared to legitimate ones.
  3. Smaller financial institutions, especially in developing countries, often struggle with limited access to extensive transaction data.

Traditional machine learning models often struggle to capture the complex temporal and relational patterns present in transaction networks. Temporal Graph Neural Networks (TGNNs) have emerged as powerful tools for modeling such data, effectively integrating temporal dynamics with the structural relationships between entities.

However, deploying TGNNs on sensitive financial data raises significant privacy concerns. Horizontal Federated Learning (HFL) offers a solution by enabling multiple institutions to collaboratively train a shared model without exchanging raw data. In this post, we delve into the mathematical foundations of integrating TGNNs within an HFL framework to enhance credit card fraud detection while preserving data privacy.

Project overview:

  • Implemented and trained a dynamic graph neural network (DGNN) in a horizontal federated learning (HFL) setting for anomalous and privacy preserving fraud detection of credit card transactions, successfully achieving \(>96\%\) accuracy on datasets with upwards of 20 clients.
  • Securely aggregated model weights from multiple banks without sharing customer data, reducing the risk of data breaches while maintaining model performance.
  • Leveraged multi-GPU training for each participating bank and implemented efficient model aggregation, achieving a 20x speedup in overall training time and enabling near real-time fraud detection capabilities

Languages & Libraries: python, PyTorch, NumPy, scikit-learn, pandas, matplotlib

Prerequisites: graph neural networks, deep learning


Technical writeup in progress…