Reducing tail latency for multi-bottleneck in datacenter networks: a compound approach
The effectiveness of network congestion control fundamentally depends on the accuracy and granularity of congestion feedback. In datacenter networks, precise feedback is essential for achieving high performance. Most existing approaches use either Explicit Congestion Notification (ECN) or network delay (e.g., RTT) independently as congestion indicators. However, in multi-bottleneck networks, the limitations of these signals become more pronounced: ECN struggles with large cumulative end-to-end latency, while RTT lacks the precision needed to control queuing delays at individual hops. To address these challenges, we propose Cocktail, a simple yet effective transport protocol for datacenter networks that combines both ECN and RTT congestion signals to more effectively handle multi-bottleneck scenarios. By leveraging the ECN signal, Cocktail bounds per-hop queue lengths, enhancing its ability to control single-hop latency and prevent packet loss. Additionally, by estimating RTT, Cocktail effectively manages end-to-end delay, resulting in lower Flow Completion Time (FCT). Extensive experimental evaluations in Mininet demonstrate that Cocktail significantly reduces the average and 99th-percentile completion times for small flows by up to 20% and 29%, respectively, compared to current practices under production workloads.
History
School
- Science
Department
- Computer Science
Published in
Computer NetworksVolume
257Publisher
Elsevier B.V.Version
- AM (Accepted Manuscript)
Rights holder
© Elsevier B.V.Publisher statement
This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/Acceptance date
2024-11-16Publication date
2024-11-23Copyright date
2024ISSN
1389-1286eISSN
1872-7069Publisher version
Language
- en