Alibaba Cloud claims its new Aegaeon pooling system reduces the number of Nvidia GPUs required to serve large language models by 82% during a multi-month beta test inside its Model Studio marketplace. The result, published in a peer-reviewed paper presented at the 2025 ACM Symposium on Operating Systems (SOSP) in Seoul, suggests that cloud providers may be able to extract significantly more inference capacity from existing silicon, especially in constrained markets like China, where the supply of…








