Loughborough University
Browse

MSFPLC: Trend-aware multiscale stack fusion Packet Loss Concealment for linear prediction-based speech codecs

Download (3.07 MB)
journal contribution
posted on 2025-11-10, 15:53 authored by Haohan Shi, Xiyu ShiXiyu Shi, Safak DoganSafak Dogan
This paper proposes a Packet Loss Concealment (PLC) method for speech codecs based on linear predictive coding, utilizing attention mechanisms and Long Short-Term Memory networks to reconstruct the Linear Predictive Coefficients. A novel multiscale trend-aware multi-head self-attention network is designed to capture the long-term global correlations and short-term local dependencies of speech signals across different time scales, enabling effective global and local receptive fields during the reconstruction of lost packets. A new multiscale Stack Fusion method is introduced to further enhance reconstruction performance. It assigns higher weights to speech frames closer to the lost packets and lower weights to distant ones, enabling effective integration of global and local features across various time scales. Additionally, a tailored loss function is proposed to guide model training by balancing the numerical precision, structural periodicity, and perceptual fidelity. Objective and subjective evaluations consistently indicate that the proposed method sustains robust performance across varying packet loss rates and speaker variability, underscoring its enhanced generalization. The alignment of improvements across multiple evaluation metrics demonstrates that these advancements are architectural rather than dataset-specific. Notably, the proposed method reveals that integrating codec-internal parameters with multiscale temporal modeling provides intrinsic robustness than post-processing PLC methods. Furthermore, the proposed model requires only 0.16 Giga Multiply-Accumulate Operations per second, underscoring its strong potential for high-quality realtime speech communication applications.<p></p>

Funding

Loughborough University (Grant No. GS1016)

China Scholarship Council (Grant No. 202208060237)

History

School

  • Loughborough University, London

Published in

IEEE Transactions on Audio, Speech and Language Processing

Volume

33

Pages

3988 - 4003

Publisher

Institute of Electrical and Electronics Engineers

Version

  • AM (Accepted Manuscript)

Rights holder

© IEEE

Publisher statement

This accepted manuscript has been made available under the Creative Commons Attribution licence (CC BY) under the IEEE JISC UK green open access agreement.

Acceptance date

2025-09-16

Publication date

2025-09-25

Copyright date

2025

ISSN

1063-6676

eISSN

1558-7924

Language

  • en

Depositor

Dr Xiyu Shi. Deposit date: 16 September 2025