<p dir="ltr">Visible-infrared person re-identification (VI-ReID) significantly enhances identity retrieval across different illumination conditions by matching visible and infrared modalities. However, existing contrastive-learning-based approaches predominantly focus on cross-modal feature alignment, thus undermining model reliability in complex scenarios. To address this challenge, we introduce a Dual Alignment Knowledge Distillation (DAKD) framework that leverages comprehensive self-distillation at both instance and class levels. Our framework incorporates a temperature-modulated alignment strategy, capturing rich modality-invariant generalities as well as modality-specific discriminative details. Additionally, we propose a confidence-based selective masking mechanism that guides the distillation towards confident and informative teacher predictions. To further enhance robustness against modality discrepancies and intra-class variations, we develop a dedicated augmentation technique, CutSwap, which exchanges image channels to simulate realistic cross-modality variations. Extensive experiments on the benchmark SYSU-MM01 and RegDB datasets demonstrate superior performance compared to other state-of-the-art methods, achieving rank-1 accuracies of 76.31% and 94.83%, respectively and validating the efficacy of DAKD in maintaining robust cross-modal alignment while preserving essential identity-specific discriminative information.</p>
This article has been published as open access under the CC BY 4.0 licence. This PDF is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form.