Skip to content
Morning Briefing · Wednesday, April 8, 2026

Meta Drops Llama 4: First Open-Weight Natively Multimodal MoE Models — 10M Token Context Window

ai-mlnetworkingopen-networkingautomationdatacentersciencesecurity
Listen to the episode
Llama Opens the Floor
20 min · 99 turns
Plate Iembedding · space
Embedding space — clusters carry related concepts; the highlighted query vector pulls its nearest neighbors.

Amaze Networks Morning Briefing — Wednesday, April 8, 2026


Top Highlights
№ 01·Top Highlights

Top 3 Highlights

1. Meta Drops Llama 4: First Open-Weight Natively Multimodal MoE Models — 10M Token Context Window

TL;DR: Meta released Llama 4 Scout and Llama 4 Maverick on April 5 — the first open-weight models built natively multimodal from the ground up using a mixture-of-experts architecture. Scout hits a 10 million token context window with no equal in open-source. Maverick beats GPT-4o across benchmarks while using less than half the active parameters.

Key Points:

  • Scout: 17B active parameters, 16 experts, 109B total parameters, 10M token context window via iRoPE architecture
  • Maverick: 17B active parameters, 128 experts, 400B total parameters, 1M token context; beats GPT-4o and Gemini 2.0 Flash on most benchmarks
  • Both trained on 30+ trillion tokens (double Llama 3), at FP8 precision achieving 390 TFLOPs per GPU
  • Native multimodality via early fusion architecture with MetaCLIP-based vision encoder — not bolted on post-training
  • Meta signals a strategic shift: largest future models will stay proprietary, making Scout/Maverick the "open ceiling" for the foreseeable future

Deep Dive:

The architectural choices in Llama 4 are worth pausing on. The mixture-of-experts design isn't just about benchmark scores — it's about inference economics. With only 17 billion active parameters at inference time despite 109B or 400B total parameters, the serving cost drops dramatically compared to a dense model of equivalent capability. For anyone running inference workloads on their own hardware, that cost structure matters enormously.

The 10 million token context window on Scout has real infrastructure implications. Moving from 128K to 10M tokens isn't a marginal upgrade — it fundamentally changes what's possible for retrieval-augmented generation and long-document analysis without needing a separate retrieval system. But serving 10M tokens also creates a new class of memory bandwidth problem. A 10M-token context with a 17B parameter model requires moving enormous amounts of KV cache data across interconnects — which is exactly where high-bandwidth fabric design intersects with AI serving infrastructure. RoCEv2 lossless Ethernet starts looking less optional and more required at that scale.

Meta's decision to partially close future models is the strategic signal worth watching. Open-weight Llama has been the tide that lifted many open-source boats — fine-tuning tooling, local inference optimization, enterprise adoption of open models. If the next generation of Llama capability lives behind an API, the entire ecosystem around self-hosted AI shifts. The network engineering takeaway: model inference serving at scale is going to get more traffic-intensive, not less, because even if you're calling an API rather than serving locally, the traffic from AI applications into hyperscaler inference endpoints is still your network's problem.

So What? If you haven't benchmarked Llama 4 Maverick against your current API-based workflows, that's this week's homework. The benchmark performance at open-weight is genuinely competitive. For infrastructure engineers: start thinking about what 10M-token context serving actually does to your east-west fabric. It's a different traffic profile than anything you've had to spec before.

SourcesMeta AI Blog, Hugging Face Blog, Evermx Analysis


2. Ultra Ethernet Hits 800GE Line Rate — Keysight and Broadcom Land First Public Interop at OFC 2026

TL;DR: Keysight Technologies and Broadcom demonstrated the industry's first public interoperability of Ultra Ethernet Consortium (UEC) specification — Link Layer Retry (LLR) and Credit-Based Flow Control (CBFC) — at 800 gigabit Ethernet line rate using Broadcom's Tomahawk switch. This is a production-readiness milestone for the AI fabric standard that Ethernet Alliance has been building toward.

Key Points:

  • First public UEC spec interop at full 800GE line rate — Link Layer Retry and Credit-Based Flow Control validated together
  • Broadcom Tomahawk Ultra Ethernet switch + Keysight Interconnect and Network Performance Tester
  • LLR provides reliable delivery at the link layer (without TCP overhead), CBFC manages congestion without PFC's head-of-line blocking
  • UEC 1.0 spec targets AI cluster east-west traffic: high bandwidth, synchronized burst patterns, microsecond tail latency requirements
  • Ethernet Alliance 2026 roadmap adds 400G-per-lane project; UEC focuses on flexible congestion management and small-message performance
  • ~70% of new AI infrastructure deployments now choosing Ethernet-based fabrics over InfiniBand (Broadcom March 2026 earnings)

Deep Dive:

Ultra Ethernet matters because it's the answer to the question "can Ethernet actually beat InfiniBand for AI?" and the answer is increasingly yes — but only with the transport enhancements UEC is standardizing. The problem with using stock Ethernet in AI clusters isn't bandwidth. It's that Ethernet's congestion management was designed for bursty TCP traffic, not synchronized all-reduce operations where tens of thousands of GPUs generate simultaneous traffic bursts every few milliseconds.

Link Layer Retry and Credit-Based Flow Control are the key pieces. LLR handles the reliability problem — instead of dropping packets and waiting for TCP retransmission, it retransmits at the link layer within microseconds, preserving the low-latency profile the application needs. CBFC manages flow without PFC's notorious head-of-line blocking problem, which has historically made Priority Flow Control-based networks brittle. Together, these two mechanisms close the gap between Ethernet's operational simplicity and InfiniBand's AI performance profile.

The Broadcom Tomahawk interop is significant because Tomahawk is the merchant silicon that shows up in most AI fabric deployments. If UEC validation is running against Tomahawk in a public demo, production deployments are not far behind. The 70% Ethernet-over-InfiniBand adoption figure from Broadcom's earnings call reflects infrastructure already in the ground — but those are mostly RoCEv2 deployments without full UEC features. The next wave of builds, targeting 2027 capacity, is where UEC 1.0 starts shipping in production.

So What? If you're speccing an AI fabric today, get the UEC 1.0 spec on your reading list. Specifically, understand the LLR and CBFC mechanisms and ask vendors which silicon has validated support. The switch from "RoCEv2 with careful PFC tuning" to "UEC with LLR/CBFC" is a real operational improvement — fewer instability events, better congestion behavior, and standard-based interoperability rather than per-vendor tuning.

SourcesKeysight Press Release, Converge Digest, Ethernet Alliance 2026 Roadmap


3. SONiC Passes the Tipping Point — Gartner Projects 40%+ of Large DC Networks by 2026, Orange at 90 Switches Live

TL;DR: New adoption data confirms SONiC has crossed from hyperscale curiosity to enterprise production reality. Gartner projects more than 40% of large datacenter networks (200+ switches) will run SONiC in production by end of 2026. Orange is live with 90 switches, targeting 150+. The AI era is the demand driver — SONiC's roadmap aligns directly with AI fabric requirements.

Key Points:

  • Gartner: 40%+ of organizations with 200+ switches running SONiC in production environments by 2026
  • Orange (telco): 90 SONiC switches in production today, 150+ planned for telco network disaggregation
  • Alibaba Cloud: SONiC deployed across 28 regions, 86 availability zones, 100,000+ whitebox devices
  • Microsoft Azure: SONiC is the AI datacenter NOS — this is the deployment driving the roadmap
  • 4,300+ active contributors across 520+ contributing organizations — ecosystem depth is real
  • AI training and inference requirements (high bandwidth, predictable latency, congestion management) directly align with SONiC's active development roadmap
  • ONUG analysis: next enterprise adoption wave is being pulled forward by AI infrastructure build-outs

Deep Dive:

SONiC's trajectory is a story about what happens when hyperscaler demand creates a platform compelling enough to solve enterprise problems that enterprise vendors couldn't. It originated as Microsoft's solution to the question "why are we paying traditional networking vendors to run an NOS that doesn't do what we need?" The answer was a containerized, Linux-based NOS on commodity ASICs, built around a clean configuration database (ConfigDB) architecture that actually separates control and data plane concerns.

The enterprise adoption acceleration is real, but with important caveats. Gartner's 40% figure applies to large datacenter networks — organizations running 200+ switches. Below that threshold, SONiC's operational complexity still favors traditional vendor NOS. The learning curve around ConfigDB, the container model, and gNMI support (still maturing across vendors) is non-trivial. What's changing is that enterprise networking teams at AI-scale organizations are now building the expertise anyway, because their AI clusters require it.

The telco angle — Orange running 90 switches live — is underappreciated. Telcos have different requirements than hyperscalers: they need robust BGP implementations, multi-vendor interoperability, and carrier-grade reliability. Orange running SONiC at this scale is validation that those requirements are being met. It also means the open ecosystem contributions from telcos are shaping the platform's direction in ways that benefit enterprise users.

So What? If you're in an enterprise shop running 100+ switches and haven't looked at SONiC in the last 12 months, look again. The Dell Enterprise SONiC Ansible collection has matured significantly. The gNMI support is production-ready on most major platforms. The question isn't "is SONiC viable" anymore — it's "what migration path makes sense for your environment?" Start with a leaf-spine test cluster and the Network to Code automation tooling. The operational story has improved substantially.

SourcesONUG Enterprise Adoption Report, SONiC Foundation Growth Announcement, TechTarget SONiC Analysis


Automation
№ 02·Automation

Network Automation

Plate IIautomation
Source-of-truth pipeline — intent → diff → apply → verify, idempotent on every revolution.

Source-of-Truth Convergence: NetBox, Nautobot, and Infrahub Compared at NANOG 93

The NANOG 93 session "Network Automation Showdown: NetBox, Nautobot & Infrahub for Source of Truth" reflects a maturing category where the choices are no longer theoretical. The community is actively deploying and comparing these three platforms in production, and the results are clarifying.

NetBox remains the documentation-first, IPAM/DCIM powerhouse — if you need a clean, well-supported network inventory and IP management system, it's the reliable choice. Nautobot (the Network to Code fork) extends that with automation-first features: GraphQL API, native Git integration, custom workflows, and an app platform that reportedly cuts development time by 70%. Infrahub, the newest entrant, is attempting a more radical model: a graph database-based approach where every relationship between network objects is a first-class citizen, enabling more sophisticated dependency tracking and change impact analysis than either NetBox or Nautobot's relational database approaches support.

So What? If you're starting fresh on a source-of-truth implementation today, the choice has narrowed: Nautobot if you want the automation platform with the most mature ecosystem (Network to Code's tooling, Ansible collection, REST/GraphQL APIs). Infrahub if you're willing to bet on graph-native as the right long-term data model for network relationships — it's earlier stage but architecturally interesting. NetBox if you primarily need documentation-first inventory and don't need automation workflows baked in.

SourcesNANOG 93 Session, Nautobot Documentation


AI-Assisted Ops: The Benchmark Landscape Shifts From Models to Infrastructure

With GPT-5.4 Pro leading at a composite benchmark score of 92, Gemini 3.1 Pro at 87, and Claude Opus 4.6 at 85 (per BenchLM.ai), the model competition is increasingly less about "which model is smarter" and more about "which infrastructure can serve it fast enough at what cost." NVIDIA's GTC 2026 saw enterprise agentic deployment discussion dominate over raw benchmark announcements — the market has moved from "look what it can do" to "how do we run this in production."

For network automation practitioners: the practical implication is that AI-assisted ops tooling is maturing faster than most teams can adopt it. The Itential FlowAI agentic orchestration covered yesterday, combined with this week's model releases, suggests the toolchain is now capable of handling complex, multi-step network changes with appropriate governance. The gap is operator readiness, not model capability.

So What? Pick one AI-assisted workflow to implement this quarter — troubleshooting root-cause analysis is the easiest starting point with the clearest ROI and limited blast radius. Don't wait for the model landscape to settle. It won't.


AI / ML
№ 03·AI / ML

AI/ML

Plate IIIai / ml
Embedding space — clusters carry related concepts; the highlighted query vector pulls its nearest neighbors.

Meta Signals Hybrid Open-Source Strategy for Future Models

An Axios report published April 6 confirms Meta is planning a hybrid approach for upcoming AI models: open-weight releases for some sizes, proprietary API-only for the largest. This marks a departure from the strategy that made Llama the standard for self-hosted AI. The company's previous commitment to fully open-weight releases at all sizes is quietly being walked back as competitive pressure from proprietary frontier models intensifies.

The practical consequence for infrastructure operators: the Llama model family is likely to remain the open-weight standard at the 70B-and-below tier, but the compute-intensive, frontier-capability models will live behind Meta's API. This bifurcation mirrors what we've seen with Mistral, Cohere, and Anthropic — the pattern is converging across the industry.

So What? Build your self-hosted inference stack around Llama 4 Scout and Maverick now while they're available open-weight. If Meta's strategic direction holds, the next generation of capability at this performance level may not be available for self-hosting.

SourcesAxios Report, SiliconAngle


Datacenter
№ 04·Datacenter

Datacenter

Plate IVdatacenter
Datacenter row — per-rack utilization at a glance. Cool colors are slack; warmer fills are pressure.

Liquid Cooling Evolves to Intelligent Thermal Infrastructure — Power Compute Effectiveness Replaces PUE

The datacenter industry is retiring PUE (Power Usage Effectiveness) as the primary efficiency metric in favor of Power Compute Effectiveness (PCE) — a metric that directly connects energy consumption to usable compute output rather than just the ratio of IT load to total facility power. This shift is being driven by AI density: NVIDIA Blackwell GPUs generate up to 1,000 watts per chip, and rack densities have moved from 15 kW to 120-132 kW in AI configurations.

The operational change: liquid cooling is no longer a passive thermal system but an active, sensor-dense intelligent infrastructure layer. Every liquid loop is now instrumented with flow rate and temperature sensors feeding AI-driven infrastructure management platforms that enable predictive maintenance and dynamic workload migration. Direct-to-chip liquid cooling is now standard for GPU racks above 30-40 kW.

Data Center World 2026 (April 20-23, Washington DC) is featuring 500+ exhibitors focused on this transition — if you're planning AI infrastructure purchases, it's worth tracking the announcements coming out of that event.

So What? When speccing new datacenter capacity for AI workloads, stop using PUE as your efficiency benchmark — it was designed for a world where compute density was measured in kilowatts, not megawatts. Ask vendors for PCE numbers and ask specifically about the telemetry density of their cooling infrastructure. An uninstrumented cooling system is a liability when you're running 100 kW per rack.

SourcesByteBridge Liquid Cooling 2026, Data Center World 2026 Announcement


Security
№ 05·Security

Security Architecture

Plate Vsecurity
Zero-trust egress — credentials are injected at the proxy boundary, never reaching the client runtime.

Three-Tier Microsegmentation Framework Peer-Reviewed in ScienceDirect — Academic Validation Catches Up to Practice

A peer-reviewed paper in ScienceDirect introduces a structured three-tier microsegmentation framework for enterprise networks under Zero Trust Architecture. The three tiers: network-based segmentation (VLANs, SDN-controlled zones), host-based segmentation (firewall rules and agents at the workload layer), and application-layer segmentation (service mesh via Istio or Linkerd). The contribution is a formal methodology for combining all three rather than choosing between them.

This matters because most enterprise microsegmentation deployments pick one layer and call it done — typically network-based because it's the path of least resistance for a networking team. The research formalizes what practitioners already know intuitively: defense-in-depth at the segmentation layer requires all three tiers operating together, with consistent policy propagation across the stack.

The work builds on Forrester's "Golden Age of Microsegmentation" framing from earlier this year and provides the kind of vendor-neutral architectural framework that makes a compelling case for investment to executive stakeholders.

So What? If you're building the business case for microsegmentation investment, this peer-reviewed framework gives you vendor-neutral academic backing for a multi-tier approach. Map your current state against the three tiers and identify where you're weakest — application-layer segmentation is typically the gap for enterprise shops that have VLAN-based segmentation but no service mesh.

SourcesScienceDirect Three-Tier Microsegmentation, SecurityWeek Zero Trust 2026


Science
№ 06·Science

Science

Plate VIscience
Field schematic — three-body stability under quasi-equal masses, drawn from the day's central result.

Quantum Circuits Have a Noise Floor — Nature Physics Finds Deep Circuits Behave Like Shallow Ones

A study published in Nature Physics in early April 2026 found that quantum circuits have a fundamental practical depth limit: as circuit depth increases, early steps gradually lose their impact due to noise accumulation, causing deep circuits to effectively behave like shallow ones. The researchers describe this as a "strict practical limit on how deep a quantum circuit can be" — meaning noise doesn't just add errors, it erases computation history.

This has direct implications for fault-tolerant quantum computing timelines. Current quantum error correction schemes assume that with enough physical qubits, you can maintain logical qubit fidelity indefinitely. This research suggests there's a more fundamental limitation: noise-induced information loss limits how many sequential operations are actually useful, regardless of error correction overhead. It's a sobering counterpoint to the Oratomic 10,000-qubit announcements and the optimistic Shor's algorithm timelines we covered yesterday.

Meanwhile, Q-Factor emerged from stealth with a $24 million seed round targeting a million-qubit neutral atom quantum computer. The Tel Aviv-based startup (Weizmann Institute and Technion alumni) is betting on proprietary atom transport and Rydberg interaction techniques to bypass current architectural bottlenecks. A million qubits is orders of magnitude beyond current demonstrations, but the $24M seed suggests the investor community remains willing to fund long-horizon bets in this space.

So What? The Nature Physics finding is the kind of result that recalibrates timeline expectations. If you're evaluating quantum computing for cryptographic risk planning, treat the circuit depth limitation as a moderating factor on the most aggressive "RSA is dead by 2030" predictions. The technology is advancing, but the fundamental physics is still pushing back. Q-Factor's million-qubit bet is a 10-year+ story — watch the Weizmann/Technion research output for technical credibility signals.

SourcesScienceDaily Quantum Circuit Forgetting, Q-Factor Coverage, Nature Physics Citation


Quick Takes
№ 07·Quick Takes

Quick Takes

  • Nornir vs. Ansible performance: OneUptime published a March 2026 analysis finding Nornir is on average 100x faster than Ansible for network automation tasks, attributable to pure Python threading without the serialization overhead of Ansible's YAML-to-module pipeline. Not a new finding, but now it has comparative benchmark data attached to it.

  • Gartner 2026 projection confirmed: Multiple sources confirm Gartner's projection that 60% of enterprises pursuing zero trust will use more than one form of microsegmentation by end of 2026, up from less than 5% in 2023. That adoption curve is steeper than most security budgets are planning for.

  • Solar + rain dual-generation panels: A novel thin-film technology enables solar panels to generate electricity from both sunlight and raindrops simultaneously. Not networking, but genuinely cool materials science with implications for distributed power generation at edge sites and remote datacenter locations.

  • AI benchmark infrastructure: BenchLM.ai's composite benchmark scores: GPT-5.4 Pro at 92, Gemini 3.1 Pro at 87, Claude Opus 4.6 at 85. For coding (SWE-bench Verified), Claude Opus 4.6 leads at 80.8%. The model race is real, but the performance gaps are narrowing faster than the pricing gaps.


Watch Today
№ 08·Watch Today

Watch Today

  • Data Center World 2026 opens April 20-23 in Washington DC — 500+ exhibitors, major announcements on liquid cooling and power infrastructure expected
  • Meta Llama 4 Behemoth — the flagship model in the Llama 4 family has not yet been released; expect an announcement in the coming weeks
  • UEC 1.0 production deployments — watch for vendor announcements on Tomahawk-based switches with full UEC feature support through Q2 2026
  • SONiC 202505 release — the next major SONiC release is in staging; watch the SONiC Foundation GitHub for merge activity on the AI fabric roadmap items

Amaze Networks Morning Briefing — Published by Beeston Labs — beestonlabs.dev

Subscribe

Get the briefing in your inbox.

One email per weekday morning. Same writing, same sources — no audio required.