Publications

Darwin: Flexible Learning-based CDN Caching

Published in Sigcomm, 2023

Cache management is critical for Content Delivery Networks (CDNs), impacting their performance and operational costs. Most production CDNs apply static, hand-tuned caching policy parameters at cache servers, such as admission frequency or size thresholds for the Hot Object Caches (HOC) of their system. However, these static policies fall short when a server is faced with unpredictable traffic pattern changes, even when policies employ multiple control parameters/knobs. Recent approaches have proposed learning-based solutions to dynamically adjust policy parameters, but they are limited in action space, caching objectives, or impose high overhead. We propose Darwin, a CDN cache management system that is robust to traffic pattern changes and can flexibly optimize different caching objectives with unrestricted action spaces. Darwin employs a three-stage pipeline involving traffic pattern feature collection, unsupervised clustering for classification, and neural bandit expert selection to choose the optimal caching policy. Through extensive simulations, experiments using an Apache Traffic Server (ATS)-based prototype, and theoretical analysis, we show that Darwin achieves significant performance gain w.r.t. different objectives such as maximizing object hit rates and minimizing disk writes, while simultaneously adapting to traffic pattern shifts. Darwin imposes negligible overhead and achieves high throughput compared to the state-of-the-art.

Recommended citation: Jiayi Chen, Nihal Sharma, Tarannum Khan, Shu Liu, Brian Chang, Aditya Akella, Sanjay Shakkottai, and Ramesh K Sitaraman. 2023. Darwin: Flexible Learning-based CDN Caching. In Proceedings of the ACM SIGCOMM 2023 Conference (ACM SIGCOMM 23). Association for Computing Machinery, New York, NY, USA, 981–999. https://doi.org/10.1145/3603269.3604863

SFC: Near-Source Congestion Signaling and Flow Control

Published in arXiv, 2023

State-of-the-art congestion control algorithms for data centers alone do not cope well with transient congestion and high traffic bursts. To help with these, we revisit the concept of direct backward feedback from switches and propose Back-to-Sender (BTS) signaling to many concurrent incast senders. Combining it with our novel approach to in-network caching, we achieve near-source sub-RTT congestion signaling. Source Flow Control (SFC) combines these two simple signaling mechanisms to instantly pause traffic sources, hence avoiding the head-of-line blocking problem of conventional hop-by-hop flow control. Our prototype system and scale simulations demonstrate that near-source signaling can significantly reduce the message completion time of various workloads in the presence of incast, complementing existing congestion control algorithms. Our results show that SFC can reduce the 99th-percentile flow completion times by 1.2−6x and the peak switch buffer usage by 2−3x compared to the recent incast solutions.

Recommended citation: Le, Yanfang, Jeongkeun Lee, Jeremias Blendin, Jiayi Chen, Georgios Nikolaidis, Rong Pan, Robert Soule et al. "SFC: Near-Source Congestion Signaling and Flow Control." arXiv preprint arXiv:2305.00538 (2023).