Skip to content
Morning Briefing · Thursday, April 23, 2026

Google Bifurcates Its TPU Line — and Reveals a Fabric That Makes Everyone Else's Look Small

google-tpu8-virgo-networklitellm-teampcp-supply-chainkubernetes-136-ingress-nginx-eolqwen36-27b-dense-modeleurope-tier2-dc-shiftercot-miso-peak-loadionq-walking-cat-architectureknot-theory-qr-codemicrosegmentation-ai-gateway-surfacemythos-reality-check
Listen to the episode
The Boardfly and the Worm
25 min · 119 turns
Plate Ileaf · spine
Schematic leaf-spine fabric — explicit-path traffic flows across the spine plane, pods at the edges.

Amaze Networks — Morning Briefing

Top Highlights
№ 01·Top Highlights

Thursday, April 23, 2026


Top Highlights
№ 02·Top Highlights

Top 3 Highlights

1. Google Bifurcates Its TPU Line — and Reveals a Fabric That Makes Everyone Else's Look Small

TL;DR: Google announced its eighth-generation Tensor Processing Units as two distinct chips — the TPU 8t for large-scale model training and the TPU 8i for low-latency inference. The chip specs are significant; the interconnect fabric is extraordinary. The Virgo Network links 134,000 chips in a single non-blocking domain with 47 petabits per second of bisectional bandwidth, scaling to one million chips across multiple sites.

Key Points:

  • TPU 8t (training): 12.6 petaFLOPS FP4, 216 GB HBM at 6.5 TB/s, 128 MB on-chip SRAM, 3D torus topology; scales to 9,600-chip superpods and clusters of 1 million+ chips; 2.7x price-performance over prior-gen Ironwood
  • TPU 8i (inference): 10.1 petaFLOPS FP4, 288 GB HBM at 8.6 TB/s, 384 MB on-chip SRAM (3x prior gen); "Boardfly" topology cuts network diameter 56% vs. torus (7 hops vs. 16); up to 80% price-performance improvement over Ironwood for latency targets
  • The Boardfly topology uses a high-radix design with optical circuit switches (OCS) at the pod level — purpose-engineered for scatter-gather KV-cache communication during autoregressive inference decoding
  • Virgo Network: 47 Pb/s non-blocking bisectional bandwidth for 134,000 chips in one datacenter, scaling to 1 million chips multi-site via hierarchical fabric
  • TPU 8i's Collectives Acceleration Engine (CAE) replaces four SparseCores, reducing collective operation latency 5x — specifically engineered for chain-of-thought and agentic reasoning traffic patterns

The Deep Dive:

The Boardfly topology deserves more attention than it is getting. Google designed an interconnect fabric for the TPU 8i specifically around the memory access patterns of inference workloads: irregular KV-cache reads, low-latency scatter-gather, long-context decoding. A 7-hop maximum network diameter for a 1,024-chip pod provides bounded worst-case latency that torus topologies structurally cannot guarantee at scale. The optical circuit switches at the pod level are how they achieve the radix reduction — not exotic, but thoughtfully applied to a specific problem.

The broader architectural signal is Google publicly stating that training and inference cannot share a single optimized fabric without painful trade-offs. NVIDIA has covered this with HBM bandwidth headroom on H100/B200. Google is saying: stop trying, build two chips with purpose-built interconnects. The Virgo Network for training uses 3D torus — appropriate for synchronous all-reduce collective operations with high bisectional bandwidth requirements. The Boardfly for inference uses high-radix OCS — appropriate for low-latency irregular access patterns. These are not incremental design choices; they reflect fundamentally different traffic profiles.

For network engineers: if your AI infrastructure vendor is selling you a single converged fabric for both training and inference workloads, the correct question to ask is how they handle collective operation congestion isolation when both workload types compete for the same fabric resources. Google just publicly stated that is an unsolved problem on shared fabric at their scale. Enterprise shops don't operate at Google scale — but the physics of congestion isolation at the fabric layer do not change with cluster size.

So What? Do This: Next time you spec an AI cluster fabric, segment the design by workload type before selecting silicon. The question "can this fabric handle both training all-reduce traffic and inference scatter-gather simultaneously" has one honest answer: only with compromises. Design separate rail networks for training and inference if the budget allows; if it doesn't, understand the congestion isolation trade-offs explicitly before signing.

SourcesServeTheHome | Google Cloud Technical Deep Dive | CNBC


2. LiteLLM Supply Chain Worm Targets CI/CD Pipelines — Your Automation Stack Is the Attack Surface

TL;DR: Threat actor TeamPCP compromised LiteLLM PyPI packages 1.82.7 and 1.82.8 in March 2026, using a poisoned version of the Trivy security scanner as the infection vector — meaning LiteLLM's own CI/CD pipeline was the entry point. The malware activates specifically in GitHub Actions environments, exfiltrating LLM API keys, Kubernetes configs, SSH keys, and cloud credentials. A new npm wave hit April 21.

Key Points:

  • LiteLLM versions 1.82.7 and 1.82.8 on PyPI contained credential-stealing malware; the worm propagated via TeamPCP's compromise of Trivy (a widely-used vulnerability scanner) — the security tooling became the infection vector
  • Payload explicitly checks for GITHUB_ACTIONS environment variable before activating — targets GitOps automation pipelines, not developer laptops
  • Stolen credential categories: Kubernetes and Docker configs, CI/CD tokens, cloud credentials, SSH keys, and LLM API keys (OpenAI, Anthropic, Azure OpenAI, Google)
  • LiteLLM as an AI gateway sits at the intersection of every LLM credential in an organization — compromising it yields the entire AI ops layer, not just one service
  • The same TeamPCP campaign also hit Checkmarx's KICS static analysis tool and Telnyx; an npm wave on April 21 at 22:14 UTC targeted @automagik/genie, pgserve, and @openwebconcept
  • Exfiltration uses ICP canister endpoints alongside conventional webhooks — a novel persistence technique

The Deep Dive:

The "poisoned security scanner" attack pattern named by Snyk is the architectural innovation worth tracking. TeamPCP did not compromise LiteLLM by injecting code into the library directly — they compromised the vulnerability scanner (Trivy) that LiteLLM's own pipeline used to check for vulnerabilities. The security tooling became the backdoor. This is recursive in the worst possible way: the process you trust to validate your dependencies now delivers malicious code into the thing you were trying to protect.

For teams running LLM-assisted network automation — think AIOps platforms, natural language config generation, AI copilots for NOC analysts — a compromised LiteLLM installation is a complete credential compromise of the AI ops layer. LiteLLM functions as an AI gateway, routing calls to multiple LLM providers with their respective API keys. Every LLM credential in the organization passes through it. Every Kubernetes secret and cloud credential injected as environment variables in the GitOps runner is visible to the malware payload. The blast radius is not one compromised service; it is the entire infrastructure the pipeline touches.

The CI/CD activation check is deliberate. The malware does not fire in a local development environment where detection is more likely. It waits for GitHub Actions, where secrets are injected as environment variables and pipeline tokens have elevated repository permissions. A GitOps-driven network automation shop running Ansible playbooks, Batfish policy checks, or Terraform plans through GitHub Actions is precisely the environment the malware was designed to devastate. The attack chain connects source of truth (Git), pipeline executor (GitHub Actions), and AI gateway (LiteLLM) in a single sweep.

So What? Do This: Today: audit every requirements.txt and pyproject.toml in your automation repos for LiteLLM versions 1.82.7 or 1.82.8; rotate any credentials that ran through affected environments. This week: add PyPI and npm package hash pinning (--require-hashes in pip, package-lock.json integrity checks in npm) and integrate SLSA provenance verification into your CI/CD pipeline. The dependency confusion threat model now includes your security scanners.

SourcesThe Register | LiteLLM Security Update | Palo Alto Unit 42 | Trend Micro


3. Kubernetes 1.36 Ships as Ingress NGINX Hits End of Life — Automation Tooling Migration Is Overdue

TL;DR: Kubernetes v1.36 "Haru" released April 22 while its most widely-deployed ingress controller — Ingress NGINX — officially reached end of life March 24, 2026. No further patches, no security fixes. Any team running NetBox, Nautobot, Infrahub, or internal automation APIs behind Ingress NGINX on K8s is now operating unmaintained ingress infrastructure. Gateway API is the official replacement.

Key Points:

  • Ingress NGINX controller EOL: March 24, 2026. The Kubernetes Ingress API spec is not deprecated — only the NGINX-based controller implementation is done
  • Kubernetes SIG Network's official recommendation: migrate to Gateway API immediately
  • Supported GA alternatives: Envoy Gateway, Istio, Traefik, Contour — all production-capable today
  • The Kubernetes Ingress API spec (the YAML resource kind) remains supported; tooling that writes Ingress manifests is unaffected — it's the nginx controller binary that receives no more patches
  • Network automation tooling commonly runs on K8s: Nautobot, NetBox, Infrahub, AWX (Ansible Tower OSS), Gitea, ArgoCD — any of these running behind Ingress NGINX needs a migration path before the next exploitable CVE lands without a patch
  • Gateway API introduces role-oriented configuration (GatewayClass / Gateway / HTTPRoute split) — more capable than Ingress spec, better suited for multi-tenant automation platforms

So What? Do This: Run kubectl get ingress --all-namespaces and kubectl get pods -A | grep ingress-nginx in your automation clusters today. If Ingress NGINX is present, build a Gateway API migration plan before the next patch cycle. Envoy Gateway with its native HTTPRoute support is the lowest-friction path for most teams; Istio is appropriate if you are already running it for service mesh.

SourcesKubernetes Blog — Ingress NGINX Retirement | Kubernetes v1.36 Sneak Peek | Okteto Migration Guide


Networking
№ 03·Networking

Networking & Architecture

Plate IInetworking
Schematic leaf-spine fabric — explicit-path traffic flows across the spine plane, pods at the edges.

Google's Virgo Network and the Boardfly Topology

Covered in depth in Top 3 above — the architectural lesson for network teams is the deliberate separation of training and inference fabric topologies.

Europe's Datacenter Geography Redraws Around Grid Readiness

The FLAP-D metro cluster (Frankfurt, London, Amsterdam, Paris, Dublin) is no longer where European datacenter capacity growth is concentrating. Power availability and planning predictability have shifted the competitive advantage to Tier II markets. Milan has effectively graduated to Tier I in this cycle. Sines, Portugal — a deepwater port with disproportionate subsea cable density — is emerging as a structural connectivity node for transatlantic and Africa-Europe routes, with datacenter capacity following the fiber. Nordic capitals offer grid access backed by renewable energy at cost-effective rates.

The EU Data Centre Energy Efficiency Package (Q1 2026) targeting carbon-neutral operations by 2030 is adding compliance cost as a differentiator: locations with renewable access baked into the grid are worth more in 10-year investment planning than locations offering cheaper land today. The metric that matters for new large-scale capacity is no longer "distance to the nearest PoP" — it's "can I get reliable power interconnection permits in under 18 months?"

So What? If you are evaluating European co-lo strategy on a 5+ year horizon, FLAP-D anchor logic is stale. Grid readiness, renewable energy access, and planning process predictability are the 2026 decision variables.

SourcesDataCenter Dynamics — Rethinking Location Strategy


Automation
№ 04·Automation

Network Automation

Plate IIIautomation
Source-of-truth pipeline — intent → diff → apply → verify, idempotent on every revolution.

LiteLLM/TeamPCP Supply Chain Worm — Immediate Action Required

Covered in depth in Top 3 above. The coordination note: the Security section covers the "AI gateway as attack surface" architectural pattern this story reveals.

Kubernetes 1.36 / Ingress NGINX EOL — Migration Is Now

Covered in depth in Top 3 above.


AI / ML
№ 05·AI / ML

AI / ML

Plate IVai / ml
Embedding space — clusters carry related concepts; the highlighted query vector pulls its nearest neighbors.

Alibaba's Qwen3.6-27B: 27 Billion Parameters That Beat 397 Billion

Alibaba's Qwen team released Qwen3.6-27B, a 27-billion-parameter dense open-weight model that outperforms the 14x larger Qwen3.5-397B-A17B MoE model on agentic coding benchmarks — and ships under Apache 2.0 licensing for full commercial use without restrictions.

Benchmark numbers: 77.2 on SWE-bench Verified (real-world software engineering tasks requiring genuine code understanding, not sanitized synthetic tests), 59.3 on Terminal-Bench 2.0 (matching Claude Opus 4.5 on that benchmark), 1,487 on QwenWebBench. The 397B MoE predecessor scored lower on all three. Weights available on Hugging Face in BF16 and FP8 variants.

The efficiency story is the one worth tracking. A 27B dense model at FP8 precision runs on a single GPU with 48 GB VRAM — an RTX 4090 or RTX 6000 Ada. The 397B MoE predecessor requires multi-GPU server infrastructure. Two architectural innovations are credited: a "Thinking Preservation" mechanism that retains intermediate chain-of-thought across long agentic tasks (preventing reasoning context collapse), and a hybrid attention design combining Gated DeltaNet linear attention with standard self-attention to reduce KV-cache memory footprint during inference.

If the benchmark claims hold under community evaluation — and the community will be stress-testing this within the week — this shifts the economics of agentic coding assistants meaningfully toward on-premise deployment. Apache 2.0 removes the licensing friction entirely.

So What? Pull the FP8 weights from Hugging Face and evaluate Qwen3.6-27B against your actual codebase and tooling before deciding on infrastructure for agentic network ops. A model that runs on a single workstation GPU is a different deployment conversation than one requiring a DGX node.

SourcesSimon Willison | Hugging Face Model Page | MarkTechPost Analysis

Quick Take: Anthropic Mythos Reality Check

The Register's deep dive this week finds Anthropic's Mythos vulnerability-discovery model may have been significantly overhyped. Independent researchers counted roughly 40 high/critical-severity findings — not the "thousands" suggested in Anthropic's announcement. Multiple teams reproduced the showcase vulnerability finds using smaller, publicly available models. One researcher's summary: "The bugs it found are real. The Mythos story is one of misinformation and hype." [72-hr cooldown on the main story — covered Wednesday April 22. This updates with new facts.]

The architectural lesson from the reality check: vendor AI security capability claims should receive the same benchmark scrutiny you'd apply to any model evaluation claim. Wait for independent replication before adjusting your threat model.

SourcesThe Register

Quick Take: NVIDIA RTX PRO 4500 Blackwell + vGPU 20

NVIDIA released the RTX PRO 4500 Blackwell for enterprise workstation/server configurations alongside vGPU 20, which adds Multi-Instance GPU (MIG) hardware-level partitioning at workstation form factors. MIG has existed on A100/H100 datacenter cards since Ampere (2020); the advance here is bringing hardware-enforced GPU isolation (with guaranteed QoS per partition) to workstation-class hardware that engineering teams can actually afford and deploy outside a datacenter. Meaningful for small teams running shared inference experiments or fine-tuning jobs on a single node.

SourcesNVIDIA Technical Blog


Datacenter
№ 06·Datacenter

Datacenter

Plate Vdatacenter
Datacenter row — per-rack utilization at a glance. Cool colors are slack; warmer fills are pressure.

Europe Tier II Location Shift

Covered in Networking & Architecture section above.

ERCOT and MISO Forecast Datacenter Loads Dominating US Grid by 2030s

ERCOT (Texas) and MISO (Midwest) released their latest long-range load forecasts, both projecting that datacenter demand will constitute the majority of new electricity load growth through the 2030s. This is consistent with the IEA's US/global datacenter power projections (17% of US electricity by 2030 covered Wednesday) — but the ERCOT/MISO data is regionally specific and operationally significant. Texas and the Midwest have historically been attractive datacenter markets due to land cost, power availability, and regulatory predictability. The ERCOT forecast signals that even these historically grid-flexible regions will face capacity strain from datacenter load density within this decade.

So What? Site selection for datacenter capacity in Texas and the Midwest now requires modeling against the ERCOT/MISO load growth curves — not just current interconnection costs. The favorable power environment that made those markets attractive may not persist through the decade.

SourcesDataCenter Dynamics

Quick Take: US Datacenter Coal Plant Effect

A consortium of environmental nonprofits published Earth Day research showing datacenter growth is slowing the US coal plant retirement schedule. Utility operators are keeping aging fossil fuel plants online longer to meet datacenter load growth that would otherwise require demand curtailment or expensive emergency capacity procurement. The AI infrastructure build-out timeline is running faster than the grid's ability to retire legacy generation.

SourcesThe Register


Science
№ 07·Science

Science

Plate VIscience
Field schematic — three-body stability under quasi-equal masses, drawn from the day's central result.

IonQ Publishes Full Blueprint for a Fault-Tolerant Trapped-Ion Computer

IonQ researchers posted a detailed preprint (arXiv 2604.19481, April 21) laying out a complete fault-tolerant quantum computing architecture for trapped-ion hardware: the "Walking Cat." This is not a "we ran a demo circuit" paper — it is a full-stack architecture specification including hardware modules, error correction protocols, micro-architecture, decoder, and compiler, with resource estimates attached to a real benchmark problem.

The Walking Cat uses Quantum Charge-Coupled Device (QCCD) chips where ions are physically shuttled to implement qLDPC codes. A dense memory instance encodes 22 logical qubits in 102 physical qubits via a [[102, 22, 9]] code — significantly leaner than surface-code architectures. The paper identifies five functional modules: memory blocks for logical qubits under continuous error correction, magic factories for non-Clifford gate resource states, cat factories for logical measurements, Bell factories for cross-chip connectivity, and a qubit factory to replace lost physical qubits.

The honest resource estimate is what makes this paper stand out: simulating a 100-site quantum physics problem would require approximately 10,000 physical qubits and about one month of continuous runtime. That sobriety is as valuable as the engineering proposals — it calibrates expectations against real computational utility targets rather than marketing claims. Note: preprint, not yet peer-reviewed.

Connection to previous coverage: This is architecturally distinct from the IonQ photonic interconnect story (April 20) — that was about networking between quantum processors; this describes what a single fault-tolerant processor looks like internally.

So What? The Walking Cat paper sets a concrete resource accounting framework for fault-tolerant ion-trap QC. Worth reading for PQC timeline planning — if IonQ's estimate of 10,000 physical qubits for meaningful computation is representative, that milestone is closer than most enterprise security planning assumes.

SourcesarXiv 2604.19481 | IonQ Blog | The Quantum Insider

The Fun One: A New "QR Code" for Mathematical Knots — and It's Genuinely Powerful

Mathematicians Dror Bar-Natan (University of Toronto) and Roland van der Veen (University of Groningen) published a new knot invariant that breaks a decades-long impasse in knot theory: the trade-off between computational tractability and discriminative strength.

The tool generates a hexagonal color-coded heat map — resembling a QR code — that encodes complex topological information about any knot as a polynomial in two variables. It uniquely distinguishes over 97% of knots with 18 crossings. For context: the Jones polynomial (1984) distinguishes roughly 42%; the Alexander polynomial (1923) manages about 11%. Prior techniques topped out at 15-20 crossings computationally; this one handles 300 crossings readily, with aspects computed for 600-crossing knots — roughly a 10x extension of analytical reach.

The construction is inspired by a traffic-flow metaphor: model the knot as a highway network, route "vehicles" probabilistically through intersections, then extend by introducing multiple vehicle types that combine and split. The researchers conjecture the invariant is equivalent to the two-loop polynomial derived from the Kontsevich integral — if proven, that immediately establishes its full theoretical pedigree.

Why it matters beyond pure math: knot theory underlies topological quantum computing (Microsoft's anyon-based qubit approach relies on braid mathematics from the same framework). A more powerful invariant that classifies topological structures at scale is a genuine enabling tool for that program. The connection is real even if the application bridge is years away. Note: not yet peer-reviewed; covered by Quanta Magazine with researcher quotes.

SourcesQuanta Magazine


Security
Plate VIIsecurity
Zero-trust egress — credentials are injected at the proxy boundary, never reaching the client runtime.

AI Gateway Libraries Are Now High-Value Credential Aggregators — and Attackers Know It

The LiteLLM supply chain compromise (covered in full in Top 3) reveals an architectural pattern the security community is now naming the "poisoned security scanner" attack. The structural lesson: AI gateway libraries sit at the credential intersection of every integrated service in an organization — every LLM API key, every cloud credential, every Kubernetes secret passed through the pipeline. Compromising the gateway yields all of it, not just one service. The more services an AI gateway proxies, the higher-value its compromise. This is a categorically different threat profile from a typical upstream library dependency.

The mitigation architecture: treat AI gateway libraries as production-grade dependencies requiring SBOM generation, artifact hash verification in CI/CD, and dependency scanning tools that themselves have verified provenance (the recursive lesson from this attack). SLSA provenance level 2 or higher for any dependency in your AI ops stack is no longer a theoretical security posture — it's a direct response to a demonstrated attack pattern.

Microsegmentation Adoption Continues Accelerating — with Identity-Based Models Displacing Agent-per-Host

Gartner data shows 60% of enterprises pursuing zero-trust are now deploying more than one form of microsegmentation — up from under 5% in 2023. More importantly, the model is shifting: agentless and identity-based enforcement is replacing the agent-per-host approaches that stalled rollouts due to operational complexity. CISA's zero-trust microsegmentation guidance is being operationalized by vendors including Zero Networks, giving enterprise security architects a federal reference model for budget justification.

The forcing function is multi-cloud reality: organizations running AWS, Azure, and on-premises simultaneously cannot enforce consistent east-west policy at the subnet boundary. Workload-identity-aware enforcement at the compute layer is now the baseline expectation, not the advanced configuration.

SourcesAkamai/Gartner Market Guide | Zero Networks/CISA


Quick Takes
№ 09·Quick Takes

Quick Takes

  • Datacenter coal plants: US environmental research confirms datacenter growth is slowing coal plant retirement timelines — the AI buildout speed is outpacing clean grid transition capacity. (The Register)
  • Mythos reality check: Independent researchers peg Mythos at ~40 confirmed high/critical findings vs Anthropic's "thousands" framing; multiple teams reproduced the showcase finds with smaller public models. (The Register — update to April 22 story)
  • NVIDIA RTX PRO 4500 + vGPU 20: Hardware-level GPU partitioning at workstation form factor is now available; meaningful for small teams running shared inference on a single node. (NVIDIA)
  • Kubernetes 1.36 "Haru" naming: The release notes use The Great Wave off Kanagawa artwork and haiku-style formatting. The software inside is significant (Gateway API maturation, various scheduler improvements). The art is a bonus.

Watch Today
№ 10·Watch Today

Watch Today

  • Community evaluation of Qwen3.6-27B: The 77.2 SWE-bench Verified score will be community-tested this week. Watch Hugging Face, Simon Willison, and the Qwen GitHub repo for real-world performance reports outside sanitized benchmarks.
  • NANOG 97 CFP: Open through April 27, with NEMOPS (Next Era of Network Management Operations) as a focus area. Submit if you have a production automation or gNMI deployment story — the community needs more practitioner content, less vendor slides.
  • TeamPCP npm wave response: New malicious npm packages appeared April 21 at 22:14 UTC. Watch npmjs.com advisories and Socket.dev security feed for further indicators. The worm is still active.
  • IonQ Walking Cat peer review: The preprint is in arXiv; watch for conference acceptance or journal submission to establish peer-reviewed status on the resource estimates.

Automation
№ 11·Automation

Pipeline Stats

Plate VIIIautomation
Source-of-truth pipeline — intent → diff → apply → verify, idempotent on every revolution.
  • Edition: Morning Briefing — Thursday, April 23, 2026
  • Domains covered: Networking/Architecture, Network Automation, AI/ML, Datacenter, Science, Security
  • Stories: 7 primary + 4 quick takes
  • Dedup rejections: 2 (Microsoft SRv6 uSID on SONiC — covered April 13 same story; Microsegmentation 60% stat — covered April 7 same framing, reframed with CISA operationalization angle as supporting context)
  • RSS digest: Thin (top score 3.0); web search primary
  • Quality score: 4.5/5
  • 72-hr cooldown items avoided: Claude Mythos main story, QuEra Teraquop, Cloudflare intent classification, AES-128 PQC, Data Center World, ECL hydrogen, Gartner IT spending, Ansible Jinja2, all April 21-22 items
Subscribe

Get the briefing in your inbox.

One email per weekday morning. Same writing, same sources — no audio required.