Performance-Focused Memory Subsystem Verification in Modern GPUs
DOI:
https://doi.org/10.32996/jcsts.2025.4.1.79Keywords:
GPU memory subsystem, performance verification, bottleneck detection, simulation-based verification, pre-silicon optimization, memory hierarchyAbstract
Modern GPUs have shifted from compute-bound to memory-bound performance bottlenecks, particularly for AI and high-performance computing workloads. Traditional functional verification methods cannot detect performance-critical issues that emerge under real workload conditions, especially as memory hierarchies become increasingly complex with multiple cache levels and advanced interconnect. The article presents an end-to-end verification framework that combines cycle-accurate simulation with detailed memory hierarchy instrumentation to capture stall events, memory latencies, and cache behavior. Our bottleneck detection methods, using both rule-based and machine learning approaches, achieve high detection accuracy while maintaining low false positive rates across diverse GPU workloads. The framework uses trace-driven simulation to replay real workload behavior rather than synthetic benchmarks, enabling verification scenarios that closely match production memory access patterns. Integrated performance regression testing tracks key metrics throughout design iterations, preventing unintended performance degradation during optimization. Pre-silicon optimization capabilities allow architecture teams to explore design alternatives—including cache organizations, memory hierarchy configurations, and interconnect topologies—with confidence before costly silicon implementation, significantly reducing the risk of discovering fundamental performance bottlenecks post-tape-out.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.