Publications

Publications by categories in reversed chronological order.

preprints

  1. SwiftQueue: Optimizing Low-Latency Applications with Swift Packet Queuing
    Siddhant Ray, Xi Jiang, Jack Luo, Nick Feamster, and Junchen Jiang
    2024
  2. CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
    Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, and Junchen Jiang
    2024
  3. RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation
    Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Ganesh Ananthanarayanan, Ravi Netravali, and Junchen Jiang
    2024

peer reviewed

2024

  1. CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
    Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, and Junchen Jiang
    In Proceedings of the ACM SIGCOMM 2024 Conference 2024
  2. Eloquent: A More Robust Transmission Scheme for LLM Token Streaming
    Hanchen Li, Yuhan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, and Junchen Jiang
    In Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing 2024

2022

  1. A New Hope for Network Model Generalization
    Alexander Dietmüller, Siddhant Ray, Romain Jacob, and Laurent Vanbever
    In Proceedings of the 21st ACM Workshop on Hot Topics in Networks 2022

2020

  1. Machine learning based cell association for mMTC 5G communication networks
    Siddhant Ray, and Budhaditya Bhattacharyya
    International Journal of Mobile Network Design and Innovation 2020

posters

  1. Transformer-based Predictions for Sudden Network Changes (Poster)
    Siddhant Ray, Xi Jiang, Zhuohan Gu, Junchen Jiang, and Nick Feamster
    In 21st USENIX Symposium on Networked Systems Design and Implementation 2024