Global–Local Attention Modeling for Reliable Multiclass Kidney Disease Classification from CT Images
DOI:
https://doi.org/10.32996/jmhs.2026.7.5.6Keywords:
Kidney CT analysis, multiclass classification, vision transformer, explainable artificial intelligence, Grad-CAM, health informaticsAbstract
Automated analysis of kidney abnormalities from computed tomography (CT) has gained increasing importance as imaging volumes grow and radiological workloads intensify. Despite recent progress, robust multiclass classification remains challenging due to overlapping visual characteristics, acquisition variability, and class imbalance across renal conditions. In this work, we present an attention-driven framework for multiclass kidney disease classification from CT images. The proposed approach is based on a Vision Transformer (ViT-B/16) architecture that explicitly models global anatomical context while preserving discriminative local renal features. A comprehensive evaluation is conducted against established convolutional and modern CNN-based models, including ResNet50, DenseNet121, EfficientNetV2-S, and ConvNeXt-Tiny, using a CT kidney dataset containing 12,446 images spanning normal, cyst, stone, and tumor classes. The proposed model achieves the best overall performance, with 98.90% accuracy and a PR-AUC of 99.23%, demonstrating strong class-wise discrimination under imbalance. To promote transparency, gradient- and attention-based explainability techniques are employed to visualize lesion-relevant regions influencing predictions. The results indicate that transformer-based modeling offers an effective and interpretable solution for reliable CT-based kidney disease screening.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 https://creativecommons.org/licenses/by/4.0/

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment