Open Inference Stack Reshuffles — TGI Exits and SGLang Leads

The Hugging Face Spring 2026 report reshuffles the open-source inference stack: Text Generation Inference is in maintenance, SGLang has taken the throughput lead, and a new Blackwell-optimized engine is matching TensorRT-LLM. We also cover the NetDevOps adoption gap, CoreWeave's self-build pivot, and why WebRTC is architecturally broken for AI voice.

0:00/0:00loading

Transcript

102 turns · ~15 min read

HOST A

Welcome to Amaze Networks for Monday, May eleventh. Quick question for you before we get into anything else: when did you last actually evaluate whether your default open-source inference serving framework is still the right choice?

HOST B

Because the Hugging Face Spring twenty twenty-six State of Open Source report dropped over the weekend, and the answer for a lot of teams is — it isn't.

HOST A

Text Generation Inference, T G I, is in maintenance mode as of December twenty twenty-five. That's Hugging Face's own inference server. Security patches only, no new features. If you're still deploying T G I for new workloads, you're on a dead-end stack. That's the first thing you need to know this Monday.

New episodes, every weekday.

Amaze Networks drops at 4 AM CT, Monday through Friday. Spotify and Apple Podcasts submissions in progress.

RSS FeedSpotify · soonApple · soonEmail — read instead