Siddhant Ray
I am a second year PhD student in Computer Science at the University of Chicago, advised by Junchen Jiang and Nick Feamster. I am interested in machine learning methods for performance improvement in computer networks and efficient serving systems for Large Language Models, with a focus on Retrieval-Augmented-Generation (RAG) systems.
Currently I work on joint optimizations in RAG for quality and delay with query level configuration selection and resource scheduling . I also work on using Transformer models for per-packet latency prediction to improve queue selection and reduce tail-latency for latency sensitive applications.
In the past, I have worked on advances in Software Defined Networking, programmable networks and cloud computing. Additionally I have spent some time working on developing NLP techniques to analyse political corpora.
I'm fortunate to be additionally supported by the Liew Family Graduate Fellowship. Prior to starting my PhD, I earned my MSc in Electrical Engineering and Information Technology at ETH Zurich and my B.Tech in Electronics and Communication Engineering at VIT Vellore.
News
Dec, 2024 | RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation on ArXiv. |
---|---|
Oct, 2024 | SwiftQueue: Optimizing Low-Latency Applications with Swift Packet Queuing on ArXiv. |
Aug, 2024 | CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving was presented at ACM SIGCOMM’24 . |
Sep, 2023 | Joined the University of Chicago as a PhD student in Computer Science. |