Siddhant Ray

newpp.jpg

I am a second year PhD student in Computer Science at the University of Chicago, advised by Junchen Jiang and Nick Feamster. I am interested in machine learning methods for performance improvement in computer networks and efficient serving systems for Large Language Models, with a focus on Retrieval-Augmented-Generation (RAG) systems.

Currently I work on joint optimizations in RAG for quality and delay with query level configuration selection and resource scheduling . I also work on using Transformer models for per-packet latency prediction to improve queue selection and reduce tail-latency for latency sensitive applications.

In the past, I have worked on advances in Software Defined Networking, programmable networks and cloud computing. Additionally I have spent some time working on developing NLP techniques to analyse political corpora.

I'm fortunate to be additionally supported by the Liew Family Graduate Fellowship. Prior to starting my PhD, I earned my MSc in Electrical Engineering and Information Technology at ETH Zurich and my B.Tech in Electronics and Communication Engineering at VIT Vellore.

News

Dec, 2024 RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation on ArXiv.
Oct, 2024 SwiftQueue: Optimizing Low-Latency Applications with Swift Packet Queuing on ArXiv.
Aug, 2024 CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving was presented at ACM SIGCOMM’24 .
Sep, 2023 Joined the University of Chicago as a PhD student in Computer Science.

Selected publications

  1. CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
    Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, and Junchen Jiang
    In Proceedings of the ACM SIGCOMM 2024 Conference 2024
  2. A New Hope for Network Model Generalization
    Alexander Dietmüller, Siddhant Ray, Romain Jacob, and Laurent Vanbever
    In Proceedings of the 21st ACM Workshop on Hot Topics in Networks 2022