PHANTOM-DGA: Multi-Task Transformer With Cached Phonotactics, Dictionary Segmentation, and Meta-Learning for Robust DGA Domain Detection
Keywords:
Domain generation algorithm (DGA), Malicious Domain Detection, Transformer Encoder, Self-Supervised Pretraining, Contrastive Learning, Meta-Learning, Phonotactics, CalibrationAbstract
Domain generation algorithms (DGAs) enable malware to evade static blacklists by generating large volumes of pseudo-random candidate domains. While many published detectors achieve near-perfect performance under random i.i.d. splits, performance often degrades under temporal shift and when evaluating on previously unseen DGA families. This paper presents PHANTOM-DGA, a GPU-optimized, checkpointable training pipeline and feature-gated multi-task transformer that combines byte-level sequence modeling with lightweight side features: (i) classical lexical statistics, (ii) dictionary-based segmentation features, and (iii) phonotactic features learned via an n-gram consonant–vowel language model trained only on benign training data to avoid leakage. PHANTOM-DGA also supports self-supervised pretraining (masked modeling + contrastive alignment), and optional Reptile-style meta-learning to improve robustness in a held-out-family evaluation setting. Experiments on the public ‘Domain Generation Algorithm’ dataset (≈1.82M unique canonicalized domains; 52 DGA families) evaluate three protocols: random split, time-forward split, and unseen-family split. Under the random protocol, PHANTOM-DGA reaches ROC-AUC 0.9986 and F1 0.9916. Under the time-forward protocol, ROC-AUC decreases to 0.9697–0.9674 depending on ablations. Under the unseen-family protocol, the best ROC-AUC is 0.9359 with reduced false positives at 95% TPR. The results highlight that protocol choice dominates headline performance and that robustness gains require explicit distribution-shift evaluation and training strategies.