Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains
DOI:
https://doi.org/10.32996/jcsts.2025.7.12.45Keywords:
AI Security, Backdoor Detection, Large Language Models, Cross-LLM Generalization, Behavioral Anomaly Detection, AI Security, Backdoor Detection, Large language Models, Cross LLM Generilization, Behavioral Anomaly DetectionAbstract
As AI agents increasingly become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. This paper presents the first systematic study of cross-LLM behavioral backdoor detection in AI agent supply chains, evaluating generalization across six production LLMs: GPT-5.1, Claude Sonnet 4.5, Grok 4.1, Llama 4 Maverick, GPT-OSS 120B, and DeepSeek Chat V3.1. Through 1,198 execution traces and 36 cross-model experiments, we identify a critical finding: single-model detectors achieve 92.7% accuracy within their training distribution but only 49.2% across different LLMs, representing a 43.4 percentage point generalization gap equivalent to random guessing. Our analysis reveals this gap stems from model-specific behavioral signatures, particularly in temporal features with coefficient of variation exceeding 0.8, while structural features remain stable across architectures. We demonstrate that a simple model-aware detection strategy, incorporating model identity as an additional feature, achieves 90.6% accuracy universally across all evaluated models. These findings establish that organizations using multiple LLMs cannot rely on single-model detectors and require unified detection strategies. We release our multi-LLM trace dataset and detection framework to enable reproducible research in this emerging area.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment