Open Inference Stack Reshuffles — TGI Exits and SGLang Leads
The Hugging Face Spring 2026 report reshuffles the open-source inference stack: Text Generation Inference is in maintenance, SGLang has taken the throughput lead, and a new Blackwell-optimized engine is matching TensorRT-LLM. We also cover the NetDevOps adoption gap, CoreWeave's self-build pivot, and why WebRTC is architecturally broken for AI voice.
Welcome to Amaze Networks for Monday, May eleventh. Quick question for you before we get into anything else: when did you last actually evaluate whether your default open-source inference serving framework is still the right choice?
Because the Hugging Face Spring twenty twenty-six State of Open Source report dropped over the weekend, and the answer for a lot of teams is — it isn't.
Text Generation Inference, T G I, is in maintenance mode as of December twenty twenty-five. That's Hugging Face's own inference server. Security patches only, no new features. If you're still deploying T G I for new workloads, you're on a dead-end stack. That's the first thing you need to know this Monday.
New episodes, every weekday.
Amaze Networks drops at 4 AM CT, Monday through Friday. Spotify and Apple Podcasts submissions in progress.