AI in Financial Services: Real-Time Fraud Detection on Cloud-Native GPU Clusters
DOI:
https://doi.org/10.32996/jcsts.2025.7.7.16Keywords:
Real-time fraud detection, GPU acceleration, Cloud-native architecture, Multi-Instance GPU, Graph neural networksAbstract
The financial industry now benefits from a pioneering cloud-native architecture enabling real-time fraud detection through advanced GPU acceleration. At its core, the implementation utilizes NVIDIA A100 GPUs with Multi-Instance GPU technology, processing vast transaction volumes at millisecond speeds without sacrificing accuracy. Built within a Kubernetes framework, this solution features a clever two-tiered classification strategy - pairing streamlined logistic regression for initial screening with powerful gradient-boosting and neural network models for deeper analysis. Payment data moves through Apache Kafka channels, undergoes thorough Avro validation, and gets enhanced with contextual information from Redis caches alongside an Apache Iceberg feature repository. The system packages inference services using NVIDIA Triton, making them available via gRPC protocols, which dramatically cuts latency while boosting cost effectiveness versus traditional CPU approaches. Perhaps most impressively, horizontal pod scaling driven by GPU metrics allows automatic resource adjustment during busy periods. Banks and payment processors gain the muscle to satisfy tough fraud detection requirements yet stay quick-footed when facing new threats across digital channels. Few tech breakthroughs manage to nail both sides of the equation - blazing-fast number-crunching paired with practical business value. This setup tackles security headaches head-on while giving finance teams room to breathe when regulations or threats suddenly shift.