Amaze Networks Morning Briefing

№ 01·Top Highlights

TL;DR: Anthropic's Managed Agents public beta fundamentally changes the economics and architecture of running AI agents at scale. For $0.08/session-hour on top of standard token pricing, you get persistent sessions, stateless harness recovery, and a credential model that finally makes enterprise agent deployment defensible.

Key Points:

Three-component architecture: append-only session event log (durable), stateless harness (crashable, resumable), disposable sandbox (provisioned from recipes)
Pricing: $0.08/session-hour for runtime; idle/terminated time doesn't accrue; standard token costs apply separately
60% TTFT reduction at p50, 90% at p95 — from deferred container provisioning until the moment it's actually needed
MCP proxy model: credentials never reach the execution sandbox, stored in external vaults, accessed through a harness-mediated proxy
wake(sessionId), getSession(id), emitEvent() API surface; managed-agents-2026-04-01 versioned header; automatic in Claude SDK

Deep Dive:

The framing here is subtle but important. Anthropic isn't just selling hosted runners — they're making a specific architectural argument: the harness (orchestration loop) should be stateless and the session (event log) should be durable, and these two things should be independently deployable. That decoupling is the thing that makes agents production-grade rather than prototype-grade.

Today, most agent deployments fail gracefully in demo conditions and fail catastrophically in production because when the orchestration process dies mid-task, you lose everything. The stateless harness model means a new harness can pick up from the last durable event in the session log. That's not a convenience feature — it's the architectural prerequisite for any agent running tasks measured in hours rather than seconds.

The credential model is arguably more significant for enterprise adoption than the pricing. The persistent complaint from security teams about AI agents has been "we can't audit what credentials the agent has access to, and we can't scope them to just what's needed for this task." The MCP proxy pattern answers both: credentials live outside the sandbox, the harness requests scoped tokens for specific tool calls, and the audit trail lives in the session log. This is the same choke-point strategy that made SSE/SASE palatable for enterprise web traffic — funnel everything through an enforcement point.

The $0.08/session-hour is cheap enough to be negligible for enterprise buyers but meaningful enough to signal this is production infrastructure, not a free-tier toy. At a typical 15-30 minute agent task, that's $0.02-$0.04 per run. The pricing structure also creates the right incentives: you pay for time consumed, not for the overhead of keeping sessions warm.

So What? If you're running multi-step agents with more than 10 tool calls or more than 15-minute horizons, benchmark Managed Agents against your self-hosted harness this week. The stateless recovery model alone may be worth it — and it's likely to eliminate more custom scaffolding code than $0.08/hr adds in cost.

SourcesThe Register / Anthropic Engineering Blog — https://www.anthropic.com/engineering/managed-agents

2. Google Doubles Down on Custom Silicon: Next-Gen Intel IPU in Development

TL;DR: Google is co-developing a next-generation custom ASIC Infrastructure Processing Unit with Intel, doubling down on the fully bespoke SmartNIC architecture it pioneered with Mount Evans — and explicitly rejecting the AWS Nitro model. Intel's custom ASIC business hit $1B+ ARR on the back of this relationship.

Key Points:

Mount Evans (current): 200 Gbps, fully custom ASIC (not FPGA, not Arm SoC), deployed in Google Cloud C3 instances since 2022
Next-gen: no specs disclosed yet, but AI cluster east-west bandwidth pressure implies >400 Gbps target
Intel custom ASIC revenue: $1B+ annualized run rate, >50% YoY growth — Google is primary anchor
Three distinct hyperscaler philosophies: Google (custom ASIC via Intel), AWS (SoC via Annapurna Labs), Microsoft (FPGA-based custom logic)
Intel CFO David Zinsner confirmed the ARR figure; announcement framing described by The Register as "desperate to convince the pub[lic]" — the business substance is real, the PR is strained

Deep Dive:

The most interesting angle here isn't that Google is buying more Intel hardware — it's that three of the largest cloud providers have landed on three completely different SmartNIC/DPU architectures, and all three are working. That's not a sign the market hasn't settled; it's a sign that at hyperscale, the "right" answer depends entirely on what you're offloading and what your control plane looks like.

Google's choice of a custom ASIC makes sense given their infrastructure stack. Mount Evans offloads networking (Andromeda virtual networking), security (Google's zero-trust enforcement), and storage I/O (GFS/Colossus) — functions that are deep, stable, and high-volume. A custom ASIC lets you pipeline those functions without software path overhead that even the best Arm SoC can't entirely eliminate. The tradeoff is that you're committed to a multi-year silicon design cycle; you can't patch your way out of a bad architectural decision.

The next-gen IPU timing is almost certainly driven by AI cluster requirements. As GPU-to-GPU east-west bandwidth in training clusters pushes past 400 Gbps per server, the IPU needs to handle RDMA offload, congestion signaling (DCQCN), storage QoS, and security — simultaneously, at line rate. That's exactly the workload profile where custom ASICs beat general-purpose DPU platforms.

For everyone below hyperscale: the lesson isn't "build custom silicon." It's "the SmartNIC is now part of your platform, not an add-on card." Treat DPU/IPU selection with the same architectural seriousness as NOS selection. NVIDIA BlueField is the pragmatic choice for most shops today; this Google/Intel story defines the north star for what "deeply integrated" eventually looks like.

So What? When specing your next GPU cluster, decide your DPU strategy before you decide your NOS strategy. The DPU is increasingly the enforcement point for networking, security, and storage QoS simultaneously — making that decision late forces awkward retrofits.

SourcesThe Register — https://www.theregister.com/2026/04/09/google_intel_ipu/

3. NVIDIA Nemotron 3 LTM — Open-Weights Telco AI Gets a Three-Agent Network Config Blueprint

TL;DR: NVIDIA released a 30B-parameter open-weights model fine-tuned on telecom standards and vendor documentation, alongside a reference three-agent architecture for network configuration that maps directly onto existing automation pipelines. This is the first credible on-premises AI model for network operations.

Key Points:

Nemotron 3 Large Telco Model: 30B parameters, open weights, fine-tuned via NVIDIA NeMo-Skills pipeline on telecom standards, synthetic logs, and industry documentation
Three-agent blueprint: monitoring agent → configuration application agent → impact assessment agent
Production deployments confirmed at Cassava Technologies and NTT DATA
Designed for on-premises, air-gapped deployment — no cloud dependency
Developed with AdaptKey AI; Intent-Driven RAN Energy Efficiency Blueprint co-developed with Tech Mahindra via GSMA Open Telco AI
Output: structured reasoning traces (step-by-step remediation plans) rather than raw scripts

Deep Dive:

The "air-gapped deployment" framing is doing more work than it looks like. Most previous GenAI-for-NetOps proposals died at the enterprise security review because they required sending operational data (device configs, logs, topology information) to a third-party API. NVIDIA has explicitly designed Nemotron 3 LTM to run entirely on-premises, which removes that objection. You fine-tune against your own vendor docs, runbooks, and config templates. The model never sees your production data unless you put it there.

The three-agent blueprint is well-architected because it mirrors how competent automation shops already structure their pipelines: monitor (collect state), configure (propose and apply changes), assess impact (validate the result). Each agent corresponds to a stage most teams have in some form — what Nemotron adds is an LLM reasoning layer that can navigate ambiguous situations instead of hard-coded conditionals.

The "structured reasoning traces" output format is the right call. An AI that returns a Jinja2 template or a raw Ansible playbook is writing code nobody reviewed. An AI that returns a structured chain of reasoning — "device X is in alarm state Y; the contributing factor is Z; the remediation sequence is A, B, C with expected outcomes" — is something a network engineer can evaluate, approve, and then execute. The human stays in the loop without being in every loop.

So What? Pull the NVIDIA Telco Network Configuration Blueprint from the NeMo Agent Toolkit and map its three-agent pattern against your current Nornir/Ansible pipeline stages. This is the automation reference architecture worth adapting before your vendor charges you for their version of it.

SourcesNVIDIA Blog — https://blogs.nvidia.com/blog/nvidia-agentic-ai-blueprints-telco-reasoning-models/

№ 03·Networking

Networking & Architecture

Plate IInetworking

Schematic leaf-spine fabric — explicit-path traffic flows across the spine plane, pods at the edges.

GPU Rail Topology Solidifying as AI Cluster Reference Design

The multi-NIC rail topology — where each server hosts multiple NICs each homed to a separate leaf switch — is now the canonical architecture for AI training clusters at hyperscale. Unlike traditional leaf-spine, this design exploits the structured AllReduce communication pattern of GPU training to localize congestion within isolated rails.

Key parameters: leaf tier built 1:1 non-blocking; aggregation at 2:1 when jobs are pod-local; core at 4:1 for inter-pod backbone. A 0.001% packet drop rate can reduce training throughput 10-30%, making PFC + ECN + DCQCN the mandatory lossless transport stack. RoCEv2 over Ethernet is now the de facto standard across Meta, Google, Microsoft Azure, and Amazon — InfiniBand retained only in turnkey DGX SuperPOD deployments. Broadcom Tomahawk 5 is the common silicon substrate.

So What? Design GPU cluster fabric with multi-NIC per server and per-NIC uplinks to isolated rail switches. Do not use a single aggregated ToR uplink — this serializes AllReduce traffic and tanks training throughput.

SourcesThe Network DNA / Dell'Oro Group — https://www.delloro.com/from-scale-to-optimization-gtc-2026-signals-the-next-phase-of-ai-infrastructure/

IETF NEMOPS Report: Next-Era Network Management Architecture Taking Shape

The IETF IAB has released the first NEMOPS (Next Era of Network Management Operations) workshop report, outlining where network management must evolve beyond current NETCONF/YANG/gNMI paradigms. The document targets the gap between today's model-driven telemetry and the closed-loop, AI-assisted management plane that large-scale networks require.

This is standards-track, not yet production-implementable. NANOG 97 CFP (open through April 27) explicitly includes IETF/NANOG collaboration on network management as a focus area. Engineers building automation pipelines on gNMI today should track this — it will determine whether gNMI evolves in place or gets superseded.

SourcesIETF Blog — https://www.ietf.org/blog/iab-nemops-report/

№ 04·Automation

Automation & Programmability

Plate IIIautomation

Source-of-truth pipeline — intent → diff → apply → verify, idempotent on every revolution.

NVIDIA Slinky: Slurm Clusters as Kubernetes Custom Resources

NVIDIA's Slinky project lets organizations run complete Slurm clusters on top of Kubernetes using a slurm-operator that maps all Slurm daemons into containerized pods with automatic HA. Validated at 8,000+ GPUs with performance parity to traditional Slurm.

Critical details for GB200 NVL72: ComputeDomains automatically manage Internode Memory Exchange (IMEX) domains — creating and destroying them dynamically as workloads start and stop, ensuring full NVLink bandwidth across node boundaries. Topograph exposes NVLink domain hierarchy to both Slurm and Kubernetes schedulers simultaneously. Single Prometheus/Grafana stack covers both environments with per-job GPU metrics labeled with Slurm job IDs.

This collapses the dual HPC/cloud-native stack problem without forcing a migration of existing Slurm workflows.

So What? If you're running both Slurm training and Kubernetes inference today, Slinky is the answer for GB200 NVL72 deployments — evaluate it before designing the next HPC expansion.

SourcesNVIDIA Technical Blog — https://developer.nvidia.com/blog/running-large-scale-gpu-workloads-on-kubernetes-with-slurm/

Dolt: Git-Semantics Database as a Source-of-Truth On-Ramp

Dolt is a MySQL-compatible relational database with full Git semantics (branch, diff, merge, pull request) built in. It exposes standard MySQL wire protocol, meaning any tool that speaks MySQL — Ansible, Python, Nornir inventory plugins — connects without modification.

The practical case: shops that haven't adopted NetBox or Nautobot often have critical network inventory locked in Excel. Dolt offers a migration path that preserves relational structure while adding the version-control discipline automation pipelines need. Branch-based testing means you can stage inventory changes, run Batfish/pyATS validation against the branch state, and only merge to main after passing.

So What? If your source of truth is still a shared spreadsheet, stand up a Dolt instance this week — MySQL compatibility means your existing scripts connect on day one, and you gain full change history for free.

SourcesThe Gratuitous ARP — https://gratuitous-arp.net/

№ 05·AI / ML

AI / Machine Learning

Plate IVai / ml

Embedding space — clusters carry related concepts; the highlighted query vector pulls its nearest neighbors.

NVIDIA nvCOMP: 28% Checkpoint Compression, $56K/Month Savings, 30 Lines of Python

NVIDIA's nvCOMP library applies lossless compression (ZSTD or gANS) to model checkpoints, reducing sizes 21-29%. Crucially, compression latency is fully hidden behind write I/O — no training throughput penalty.

For a 405B model training run on 128 DGX B200 GPUs, checkpoint storage drops from ~$200K/month to ~$144K. Implementation: approximately 30 lines of Python, integrating with PyTorch distributed checkpointing. Compression ratios: ~1.27x for dense models, ~1.40x for MoE models. Algorithm selection is operationally critical: ZSTD for NFS/Lustre at 5-10 GB/s; gANS for GPUDirect Storage above 15 GB/s.

So What? Match your compression algorithm to your storage backend and ship this optimization in your next training sprint — it's genuinely no-downside at non-trivial scale.

SourcesNVIDIA Technical Blog — https://developer.nvidia.com/blog/cut-checkpoint-costs-with-about-30-lines-of-python-and-nvidia-nvcomp/

№ 06·Datacenter

Datacenter

Plate Vdatacenter

Datacenter row — per-rack utilization at a glance. Cool colors are slack; warmer fills are pressure.

Modular Datacenter Strategy: Speed to Market vs. Integration Depth

DataCenter Dynamics (April 10) makes the contrarian case on prefabricated/modular datacenters: the organizations that will lead the AI infrastructure race are not the ones that adopt modular fastest, but the ones that incorporate it most effectively. The distinction matters because modular DC adoption is being driven primarily by time-to-power pressure — hyperscalers and colo providers need capacity in 12-18 months, not the 4+ years traditional construction requires.

The risk: modular approaches optimized purely for speed create operational fragmentation, with parallel management toolchains for prefab modules and traditional infrastructure. The winning pattern is tight integration between prefab capacity and existing orchestration, cooling management, and power monitoring systems. The organization that treats modular as a parallel track rather than an integrated one is building technical debt at rack-dense scale.

SourcesDataCenter Dynamics — https://www.datacenterdynamics.com/en/opinions/the-evolving-role-of-modular-and-prefabricated-data-centers-we-want-our-tokens-now/

OpenAI Pauses Stargate UK — Energy and Regulation Hit AI's International Expansion

OpenAI has paused its planned Stargate datacenter project in the UK, citing energy costs and regulatory complexity. This is the same constraint that drove Maine's moratorium (April 9), the UK nuclear investment (April 9), and the broader 12 GW gap between planned and buildable US datacenter power. The pattern is now global: AI infrastructure ambitions are compressing against physical limits that don't care about funding rounds.

The Register notes the pause came just months after the original announcement — a reminder that AI compute buildout is running significantly faster than energy and planning infrastructure can respond.

SourcesThe Register — https://www.theregister.com/2026/04/09/openai_puts_stargate_uk_on/

№ 07·Security

Security (Architecture Only)

Plate VIsecurity

Zero-trust egress — credentials are injected at the proxy boundary, never reaching the client runtime.

Cisco Extends Zero Trust to AI Agents: MCP Gateway as Enforcement Choke-Point

Cisco announced a layered security architecture specifically for agentic AI workloads. The framework has three layers: agent identity registration in Duo IAM (every agent mapped to an accountable human owner), an MCP gateway that routes all agent tool traffic through a single enforcement point, and a runtime SDK that embeds policy enforcement at build time.

Access is scoped to short-lived, task-specific permissions rather than persistent roles. The framework supports AWS Bedrock, Google Vertex, Azure AI Foundry, and LangChain. A SailPoint survey finding 80% "unexpected agent behavior" rate underlines the urgency — most enterprises have deployed agents without identity governance.

The architectural insight: the MCP gateway is doing for AI agents what SSE/SASE did for human web traffic — funneling all tool invocations through a single enforcement point where you can log, block, and audit. This is not a conceptual framework; it's a production architecture.

So What? Map your deployed AI agents to identity records now — before your MCP surface area becomes unauditable.

SourcesCisco Newsroom — https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2026/m03/cisco-reimagines-security-for-the-agentic-workforce.html

Zero-Trust GitOps Now Production-Ready: Static Secrets Are End-of-Life

A convergence of Red Hat, Flux CD, and open-source practitioner guidance marks a clear inflection point: secretless, workload-identity-based GitOps is now the expected baseline for regulated environments, not a hardening option. The pattern — ephemeral tokens, policy-as-code enforcement, artifact signature verification at every pipeline stage — is being codified into toolchain defaults. OpenShift GitOps now issues short-lived tokens by default for repository authentication.

The shift from "credential management" to "workload identity" mirrors how zero trust matured in network access: prove who you are, receive a scoped time-limited token, never store a long-lived secret.

So What? Audit your CI/CD pipelines for static credentials or long-lived tokens. The reference architecture to replace them is now stable and well-documented.

SourcesRed Hat Developer / Flux CD — https://developers.redhat.com/articles/2026/03/13/zero-trust-gitops-build-secure-secretless-gitops-pipeline

№ 08·Science

Science

Plate VIIscience

Field schematic — three-body stability under quasi-equal masses, drawn from the day's central result.

Sterile Neutrinos Ruled Out — Physics Goes Back to Square One

Multiple independent experiments have now definitively ruled out the sterile neutrino, the most popular proposed extension to the Standard Model for explaining anomalous neutrino oscillation signals. Sterile neutrinos were attractive because a single particle type could simultaneously explain dark matter, neutrino mass, and unexplained oscillation anomalies.

With the hypothesis dead, the field must either revisit detector systematics or find a genuinely new explanation. The sterile neutrino's elimination tightens constraints on where new physics can hide — the anomalies that motivated the hypothesis are still real, which means something else is causing them. Quanta Magazine covered this on April 8 as a genuine paradigm-clearing result.

Why It's Interesting: This is how physics progresses. Eliminating the cheap answer forces more creative thinking about fundamental particle physics.

SourcesQuanta Magazine — https://www.quantamagazine.org/tag/physics/

Record-Breaking Neutrino May Be from an Evaporating Primordial Black Hole

A record-energy neutrino detected in 2023 may trace back to a primordial black hole completing its Hawking evaporation — a relic from the first seconds after the Big Bang. If confirmed, it would be the first direct observational evidence that primordial black holes exist and can evaporate, and would open neutrino observatories as dark matter detectors.

Primordial black holes are a serious dark matter candidate. The detected neutrino's energy profile fits the predicted signature of a small black hole releasing a burst of high-energy particles in its final moments better than any known astrophysical source. The analysis is preprint-stage and requires confirmation.

SourcesarXiv astrophysics — https://arxiv.org/list/astro-ph.CO/current

№ 09·Quick Takes

Quick Takes

CyrusOne gets Illinois approval for 634 MW datacenter campus in Sangamon County. US datacenter buildout continues despite power constraints. (DataCenter Dynamics, April 9)
Finnish energy firm Winda plans 100 MW datacenter in Janakkala, partnering with Gi21 Capital — first DC project for the company, expanding Nordic datacenter footprint. (DataCenter Dynamics, April 9)
IETF NEMOPS standards work advancing — next-era network management architecture targeting the gap between current gNMI streaming and AI-assisted closed-loop control. Watch for NANOG 97 sessions.
MXene molten salt synthesis achieves cleaner, more structurally controlled 2D conductive layers than HF acid etching — relevant to next-gen EMI shielding and energy storage materials.

№ 10·Watch Today

Watch Today

Anthropic Managed Agents documentation: Review the architectural breakdown of session/harness/sandbox decoupling — this is the production agent model
NVIDIA Slinky repo: If you're running GB200 NVL72 or planning HPC/cloud-native convergence, evaluate the ComputeDomain + Topograph layer
IETF NEMOPS mailing list: Early-stage, but the architecture decisions here will define your automation stack in 2-3 years
NANOG 97 CFP closes April 27: Network management and automation submissions especially relevant

№ 11·Top Highlights

Week in Review — April 7-10, 2026

This week had a clear through-line: agentic AI moved from concept to infrastructure primitive. Five separate stories across the week converged on the same theme:

Tuesday (Apr 7): Itential FlowAI launched governed agentic orchestration with an MCP server and audit trail
Wednesday (Apr 8): Source-of-truth platforms (NetBox/Nautobot/Infrahub) converging toward automation-platform status, not just documentation
Thursday (Apr 9): Cisco Meraki launched agentic workflows; Kelsey Hightower argued zero-token architecture as a counter; Meta Muse Spark's proprietary pivot raised the question of whether frontier AI is going closed
Friday (Apr 10): Anthropic launched the infrastructure layer itself — Managed Agents with stateless recovery, durable sessions, and $0.08/hr pricing. NVIDIA's telco agent blueprint. Cisco's security framework for agents.

Secondary theme: the power wall is global. Maine moratorium, UK nuclear investment, OpenAI Stargate UK pause, CyrusOne Illinois approval — every datacenter story this week bumped into the same constraint.

Third theme: the AI fabric is now a SmartNIC story. UEC 800GE interop on Wednesday, Google/Intel custom IPU today — the network-silicon layer for AI clusters is moving faster than the protocol layer.

Pipeline: 6 domains researched, 5 parallel agents, RSS digest used (score 4.0 modular DC DCD, score 2.8 Google/Intel IPU Register), ~14 web searches, 13 stories + 4 quick takes, 0 dedup rejections. Quality score: 4/5.

Anthropic Launches Managed Agents — Agentic AI Becomes Billable Infrastructure

Amaze Networks Morning Briefing

Friday, April 10, 2026

Top 3 Highlights

1. Anthropic Launches Managed Agents — Agentic AI Becomes Billable Infrastructure

2. Google Doubles Down on Custom Silicon: Next-Gen Intel IPU in Development

3. NVIDIA Nemotron 3 LTM — Open-Weights Telco AI Gets a Three-Agent Network Config Blueprint

Networking & Architecture

GPU Rail Topology Solidifying as AI Cluster Reference Design

IETF NEMOPS Report: Next-Era Network Management Architecture Taking Shape

Automation & Programmability

NVIDIA Slinky: Slurm Clusters as Kubernetes Custom Resources

Dolt: Git-Semantics Database as a Source-of-Truth On-Ramp

AI / Machine Learning

NVIDIA nvCOMP: 28% Checkpoint Compression, $56K/Month Savings, 30 Lines of Python

Datacenter

Modular Datacenter Strategy: Speed to Market vs. Integration Depth

OpenAI Pauses Stargate UK — Energy and Regulation Hit AI's International Expansion

Security (Architecture Only)

Cisco Extends Zero Trust to AI Agents: MCP Gateway as Enforcement Choke-Point

Zero-Trust GitOps Now Production-Ready: Static Secrets Are End-of-Life

Science

Sterile Neutrinos Ruled Out — Physics Goes Back to Square One

Record-Breaking Neutrino May Be from an Evaporating Primordial Black Hole

Quick Takes

Watch Today

Week in Review — April 7-10, 2026

Get the briefing in your inbox.