Technical Analysis & Engineering Review

Weekly Tech Report:
17 Mar 2026

Computer Science

MIT's Terahertz Near-Field Microscope Resolves Hidden Quantum Oscillations Inside Superconductors

By compressing terahertz light into sub-wavelength apertures, physicists image phonon-driven charge fluctuations that standard spectroscopy cannot reach

MIT physicists published results this week describing a new near-field terahertz microscope capable of imaging quantum mechanical oscillations buried inside superconducting materials — dynamics that have been theorised for decades but were experimentally inaccessible with prior instrumentation. The instrument compresses terahertz radiation, which spans roughly 0.1 to 10 THz and ordinarily diffracts at scales far too large for lattice-level imaging, into a nanometre-scale tip apex using a metallic near-field antenna. The resulting confined electromagnetic field couples to local charge and spin excitations at spatial resolutions well below the diffraction limit — a regime in which quantum fluctuations that are thermally averaged out in bulk measurements become resolvable as spatially distinct features.

The physics being imaged is the coupling between phonon (lattice vibration) modes and the electron pairs responsible for superconductivity. In conventional BCS superconductors, this coupling is both the mechanism of pairing and the upper bound on the transition temperature — the stronger the phonon-mediated attraction, the higher the critical temperature at which superconductivity emerges. In high-temperature cuprate superconductors, the pairing mechanism remains contested; multiple competing theories propose phononic, magnetic, or purely electronic origins. What all theories share is a prediction of spatially inhomogeneous charge fluctuations at sub-nanometre scales near structural defects and domain boundaries. The MIT instrument provides the first direct imaging of this inhomogeneity in real space at millikelvin temperatures, rather than inferring it from momentum-space spectroscopy.

The microscope achieves sub-10 nm spatial resolution at terahertz frequencies — more than 1,000 times sharper than the free-space diffraction limit of terahertz light — by exploiting the lightning-rod effect at a metallic tip apex to concentrate the electromagnetic field into a region smaller than a single unit cell of the crystal lattice.

The immediate implication for materials science is a new experimental handle on the pairing mechanism debate. If spatially resolved terahertz spectroscopy can identify which phonon branches are anomalously enhanced near superconducting domain boundaries — and whether that enhancement correlates with locally elevated critical temperature — it becomes possible to design materials with engineered phonon spectra rather than relying on trial-and-error chemical substitution. The longer arc toward quantum computing is also relevant: superconducting qubits are acutely sensitive to two-level system (TLS) defects at Josephson junction interfaces. The same terahertz near-field technique, adapted to operated at millikelvin qubit operating temperatures, could map TLS distributions across a fabricated qubit without destroying the device — a capability that currently requires destructive cross-sectional TEM of failed samples.

Technically, the MIT system operates by modulating the tip-sample distance at the cantilever resonance frequency and demodulating the scattered terahertz signal at higher harmonics, rejecting the far-field background that would otherwise swamp the near-field signal by several orders of magnitude. The terahertz source is a photoconductive emitter pumped by a femtosecond laser; the detector is a similar photoconductive switch gated by the same laser pulse, enabling time-domain spectroscopy with sub-picosecond temporal resolution simultaneously with the spatial scan. The combination of spatial and temporal resolution provides access to both the amplitude and phase of local charge oscillations — effectively a four-dimensional dataset that previous THz instruments could only access in one or two of these dimensions at a time.

Strain Engineering Without Doping: Air-Cavity Substrates Supercharge Monolayer Tungsten Disulfide

Researchers reshape sub-surface geometry rather than the material itself, decoupling strain from chemical perturbation in ultra-thin semiconductor stacks

A research team published results this week demonstrating a technique for enhancing the electronic and optical properties of atomically thin semiconductors by engineering the shape of the substrate beneath them rather than modifying the semiconductor material itself. The approach deposits a single-atom-thick layer of tungsten disulfide (WS₂) — a transition metal dichalcogenide (TMD) with strong excitonic properties — over a substrate that has been patterned with tiny air-filled cavities. Where the WS₂ spans a cavity, the absence of a rigid supporting surface allows the monolayer to relax into a slightly curved, strain-differentiated geometry. The lattice strain that develops as the monolayer drapes over the cavity edge modifies the local band structure, red-shifting the photoluminescence by tens of meV and increasing quantum yield in the strained regions.

The conceptual importance of the technique is that it achieves the same band-structure modification as chemical doping without introducing any foreign atoms into the lattice. Conventional doping of TMD monolayers — substituting sulfur atoms with selenium, for example, in the WS₂ lattice — introduces local scattering centres that degrade carrier mobility even as they shift the band gap. Strain engineering via substrate geometry is, in principle, fully reversible and leaves the chemical composition of the semiconductor intact. The air-cavity geometry also provides a natural photonic cavity effect: the refractive index contrast between the suspended monolayer and the air gap below enhances the local optical field strength, further amplifying the photoluminescence signal from the strained regions. In photonic applications — single-photon emitters, excitonic circuits, and quantum-dot-analogue emitters in solid-state TMD stacks — this combination of controlled strain and optical confinement in the same lithographic step is a meaningful fabrication simplification.

The scalability of the approach is the key open question. Air-cavity substrates patterned by electron-beam lithography are straightforward to characterise but expensive to manufacture at wafer scale. The transfer of large-area CVD-grown TMD films over cavity arrays without tearing or delamination is a known yield problem: surface tension during the wet-transfer process collapses cavities smaller than roughly 500 nm. Dry-transfer techniques using polymer stamps have better yield for sub-200 nm features but introduce contamination at the interface that partially suppresses the optical enhancement. The research team used EBL-patterned cavities of 300–800 nm diameter, which places the technique in a fabrication regime accessible to academic cleanrooms but not yet to semiconductor production lines. The path to wafer-scale deployment will require either cavity-definition by nanoimprint lithography (which can pattern at these length scales in parallel) or a pivot to using naturally occurring step edges and grain boundaries in epitaxial substrates as the strain sources.

TMD Material Band Gap (unstrained) Photoluminescence Primary Application Strain Sensitivity
WS₂ (monolayer) ~2.0 eV (direct) Strong, tunable Optical emitters, LEDs ~45 meV / % strain
MoS₂ (monolayer) ~1.8 eV (direct) Moderate Transistors, sensors ~72 meV / % strain
WSe₂ (monolayer) ~1.65 eV (direct) Strong Valleytronics, spin qubits ~36 meV / % strain
MoSe₂ (monolayer) ~1.65 eV (direct) Moderate Photodetectors ~50 meV / % strain

The broader research context is a years-long effort to construct functional excitonic circuits in two-dimensional materials — devices where information is carried not by electrons but by excitons (bound electron-hole pairs) that can be guided, switched, and emitted in an on-chip photonic architecture. Strain gradients in TMD monolayers produce a funnelling effect: excitons drift toward regions of lower band gap (higher strain), providing a passive routing mechanism without any applied electric field. Air-cavity substrates provide a reproducible and lithographically addressable method for creating such gradients, which is the missing manufacturing link between laboratory demonstrations of excitonic funnelling and practical excitonic integrated circuits.

Software Development

NVIDIA GTC 2026: Vera Rubin Platform, Dynamo 1.0, and the Agentic Scaling Phase

Seven co-designed chips, five rack types, and a new inference operating system — Jensen Huang reframes AI factories as the unit of compute for the agentic era

NVIDIA's annual GTC developer conference ran March 16–19 in San Jose, and Jensen Huang's keynote contained a density of announcements that warrants careful disaggregation. The headline hardware announcement was the Vera Rubin platform — the successor to Blackwell, architected from the ground up for inference and agentic workloads rather than the training throughput that characterised earlier GPU generations. The platform comprises seven co-designed chips in full production: the Rubin GPU, the Vera CPU, the Groq 3 LPU (from the $20 billion Groq acquisition completed in December), the NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and the Spectrum-6 Ethernet switch. The NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs on NVLink 6, and achieves 10× higher inference throughput per watt at one-tenth the cost per token versus Blackwell — a figure that, if it holds under production workloads, fundamentally alters the economics of large-context LLM serving.

The Groq 3 LPU integration is the most architecturally distinctive element of the platform. The chip carries 500 MB of on-die SRAM and achieves 150 TB/s of internal bandwidth — a design philosophy directly opposed to NVIDIA's HBM-centric memory architecture. Where HBM stacks provide high external bandwidth to a GPU die that caches only a fraction of model weights on-die, Groq's LPU architecture aims to hold the entire KV cache for a running context in on-chip SRAM, eliminating the HBM round-trip for cache reads during auto-regressive decoding. In practice this matters most for long-context inference — the multi-step reasoning chains that define agentic workflows — where the KV cache for a 128k-token context window at model weights precision can exceed the on-chip SRAM capacity of conventional GPUs, forcing expensive HBM accesses on every decode step. NVIDIA is positioning the Groq 3 LPX rack deployed alongside the NVL72 as a specialised decode accelerator, with the NVL72 handling prefill.

$1T Jensen Huang's projected Blackwell + Vera Rubin purchase orders through 2027
10× NVL72 inference throughput per watt vs Blackwell
150 TB/s Groq 3 LPU internal bandwidth
336B Rubin GPU transistor count, with 288 GB HBM4

On the software side, NVIDIA announced general availability of Dynamo 1.0, described as an "operating system for AI factories." The framing is deliberate: Dynamo manages the scheduling, routing, and resource allocation of inference requests across a heterogeneous pool of NVL72, Groq LPX, and Vera CPU racks, applying disaggregated prefill-decode scheduling, KV cache migration between racks, and SLO-aware request routing. The KV cache migration capability — moving cached context representations from one physical rack to another without re-computing the prefill — is the critical enabling mechanism for multi-agent workflows where different specialist sub-agents may be hosted on different hardware. Without cross-rack KV cache portability, every agent hand-off would require a full prefill re-compute, incurring both latency and cost that make complex multi-agent pipelines economically prohibitive at scale.

For developers building on top of the Vera Rubin platform, the practical entry point is the updated NVIDIA Agent Toolkit and the NemoClaw integration with the OpenClaw agent platform. OpenClaw — the open-source autonomous agent framework that surged in early 2026 — is now receiving first-party NVIDIA support through the NemoClaw stack, including CUDA-accelerated tool execution and sandboxed sub-agent orchestration. Jensen Huang also previewed Feynman, the next-generation architecture beyond Vera Rubin, though details were sparse — largely a roadmap signal that NVIDIA's hardware cadence extends well into 2028. Among the first cloud providers expected to deploy Vera Rubin NVL72 instances later in 2026 are AWS, Google Cloud, Microsoft Azure, and OCI.

Next.js 16.2, Cursor Composer 2, and the OpenAI–Astral Acquisition: Three Signals in the AI Coding Stack

An 87% faster dev server, a fine-tuned Kimi K2.5 coding model priced 86% below its predecessor, and a Python toolchain acquisition that mirrors Anthropic's Bun play

Three announcements this week collectively define the current competitive topology of AI-assisted software development. Taken individually, each is a meaningful product update. Read together, they reveal a structural pattern: every layer of the developer toolchain — the frontend framework, the IDE-native coding model, and the runtime/tooling substrate — is being pulled simultaneously toward tighter AI integration, lower inference cost, and vertical control by the major AI labs.

Next.js 16.2, released on March 18, leads with a number that earns its headline: dev server startup is roughly 87% faster than 16.1, translating to approximately a 4× time-to-URL improvement on the default application. The mechanism is Turbopack becoming lazier about compilation in the productive sense — deferring module compilation until the browser requests a specific route, rather than eagerly compiling the entire dependency graph on startup. For large Next.js applications with hundreds of routes and heavy server component trees, this makes the difference between a 30-second wait and a 4-second iteration loop. The rendering side of the release is equally significant: a contribution to React itself replaces the JSON.parse reviver approach to Server Components payload deserialisation with a two-step plain parse plus JavaScript walk, eliminating costly C++/JS boundary crossings and delivering 25–60% faster HTML rendering in real applications. The improvement is non-breaking and requires no configuration changes — upgrading the Next.js and React versions is sufficient.

# The dev experience improvements in 16.2 are purely additive
# No configuration changes required to get the rendering gains
npx @next/codemod@canary upgrade latest

# New in 16.2: AGENTS.md scaffolding in create-next-app
npx create-next-app@latest my-app
# → generates AGENTS.md with context for coding agents
# → browser log forwarding to terminal (agent-friendly debugging)
# → next-browser (experimental) for in-browser agent DevTools

The AI improvements in 16.2 deserve specific attention: the release introduces AGENTS.md scaffolding in create-next-app (providing structured context that coding agents can consume when modifying the project), browser log forwarding to the development terminal (so an agent running in a headless context can observe client-side errors without browser access), and experimental Next.js DevTools MCP integration for AI agents to query route metadata, component trees, and build diagnostics directly. These are not cosmetic additions. They represent Vercel's architectural position that the primary user of the development server is increasingly an autonomous coding agent rather than a human developer, and that the tooling surface should be designed accordingly.

Cursor's release of Composer 2 on March 19 is the most direct competitive statement of the week against OpenAI and Anthropic. The model is a fine-tuned variant of the open-source Kimi K2.5 architecture, trained through continued pretraining on code-only data followed by reinforcement learning on long-horizon multi-step coding tasks. The results on standard benchmarks are measured but real: 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0 (surpassing Claude Opus 4.6's 58.0 and Claude Opus 4.5's 52.1), and 73.7 on SWE-bench Multilingual. GPT-5.4 still leads Terminal-Bench 2.0 at 75.1 in its highest configuration, and Cursor makes no claim to universal superiority — the pitch is capability per dollar. Composer 2 Standard costs $0.50/$2.50 per million input/output tokens, versus the predecessor Composer 1.5 at $3.50/$17.50 — an 86% price reduction on both input and output that repositions Composer 2 as the default cost-efficient option for Cursor's one million daily active users.

Model CursorBench Terminal-Bench 2.0 SWE-bench Multilingual Input $/M tok
GPT-5.4 (high) 75.1 ~$15.00
Composer 2 Fast 61.3 61.7 73.7 $1.50
Claude Opus 4.6 58.0 ~$15.00
Composer 2 Standard 61.3 61.7 73.7 $0.50
Composer 1.5 44.2 47.9 65.9 $3.50

The third announcement — OpenAI's agreement to acquire Astral, the company behind uv, Ruff, and ty — is the most structurally interesting for the Python ecosystem's long-term governance. Astral's toolchain has, in roughly two years, displaced pip, virtualenv, pyenv, flake8, black, isort, and mypy as the default choice for new Python projects, accumulating hundreds of millions of downloads per month in the process. The acquisition mirrors Anthropic's December 2025 acquisition of Bun (the JavaScript runtime that powers Claude Code's CLI), though the strategic rationale differs somewhat: where Bun was a core runtime dependency of Claude Code that Anthropic brought in-house to guarantee maintenance continuity, Astral represents a bid by OpenAI to embed its Codex agent directly into the dependency management, linting, and type-checking steps of every Python project — not merely the code generation step. The deal terms were not disclosed and regulatory approval is pending; OpenAI has committed to keeping Astral's open-source tools actively maintained post-close.

The pattern is now clear: Anthropic acquires Bun (JavaScript runtime), OpenAI acquires Astral (Python toolchain). The coding agent wars are being fought not just at the model layer but at the developer infrastructure layer — whoever owns the toolchain owns the agent's eyes and hands inside the codebase.

Stripe's Minions Ship 1,300 Pull Requests per Week: What Unattended Coding Agents Look Like in Production

One-shot sandboxed agents, sub-10-second devbox spin-up, and human review as the only gate — Stripe's architecture for autonomous software development at scale

Stripe disclosed this week that its internal AI coding agents, called Minions, are now generating over 1,300 pull requests per week, with every line of code written by the agents and every PR reviewed by a human before merge. The disclosure is notable not for the volume — which will seem conservative in a few years — but for the specific architectural choices Stripe made in deploying unattended agents at this scale in a payments infrastructure context where a single defective PR can affect millions of transactions.

Minions are explicitly not interactive copilots. The design pattern is one-shot and asynchronous: an engineer submits a task specification through Slack, a CLI, or a web interface, and the agent takes complete ownership of the task from that point — reading the relevant codebase context, writing the implementation, generating tests, producing documentation, and opening a pull request — without further engineer interaction during execution. Each Minion runs in an isolated, pre-warmed devbox that spins up in under 10 seconds. The pre-warming is architecturally significant: it means the agent environment is not provisioned on demand (which would introduce cold-start latency that disrupts the async workflow) but is maintained in a ready state, implying Stripe is carrying a standing pool of idle compute as an operational cost of the system. The isolation guarantee is equally important in a payments context: each agent session has no network access beyond approved internal endpoints, preventing a compromised or hallucinating agent from exfiltrating code or making unauthorised external calls.

The human review gate — the only gate — is a deliberate architectural choice that reflects a specific theory of where human judgment adds value in an agentic code production pipeline. Stripe is not using automated test suites as a merge gate beyond what would be applied to any human-authored PR. The implicit claim is that experienced human code reviewers, reviewing a complete PR with full diff context, can reliably detect the quality and safety issues that matter in a payments codebase — and that adding further automated review stages would introduce review latency that undermines the productivity case for agents without catching failure modes that reviewers miss. This is a reasonable position for a company with Stripe's engineering culture, but it transfers poorly to organisations where reviewer quality is more variable or where the domain (security-critical cryptography, regulatory compliance code) requires specialist knowledge that reviewers may not reliably possess.

Hardware Engineering

Inside the Vera Rubin NVL72: 336B-Transistor GPU, 288 GB HBM4, and the Kyber Vertical Tray Design

From the Rubin GPU's memory subsystem to the Kyber preview for Vera Rubin Ultra — a hardware engineer's reading of GTC 2026's silicon announcements

The Vera Rubin platform deserves a second, hardware-focused pass beyond the software-layer analysis above. The Rubin GPU die carries 336 billion transistors — a transistor count that reflects both the move to Samsung's 4nm process node (shared with the Groq 3 LPU) and the architectural decision to integrate the NVLink 6 fabric interface die-to-die with the compute die, rather than routing off-package through a traditional PCIe or SXM connector. The memory subsystem ships with 288 GB of HBM4 per GPU, providing 22 TB/s of memory bandwidth per chip — a bandwidth density that, across a full NVL72 rack, approaches 1.6 PB/s of aggregated memory bandwidth before NVLink interconnect is factored in. For the prefill phase of large-context inference, where the entire prompt must be read from memory and processed in parallel, this bandwidth figure directly determines the latency floor: a 128k-token context with a 70B parameter model requires roughly 17 GB of KV cache data per request, meaning each prefill read traverses the HBM stack at speeds that would saturate a 100 GbE network link in microseconds.

The Vera CPU, positioned as the successor to NVIDIA's Grace CPU, is an Arm-based design with LPDDR5 memory rather than HBM, targeting the CPU-intensive portions of agentic workflows: environment simulation, reinforcement learning rollout generation, tool execution, and the orchestration logic that routes sub-agent tasks. The design offers twice the energy efficiency of comparable x86 server CPUs and 50% faster single-thread performance, positioning it not as a general-purpose server processor replacement but as a co-processor optimised for the specific access patterns of agentic RL workloads — random-access, low-reuse reads over large state tables that would thrash GPU L2 caches if run on the Rubin die itself. The Vera CPU Rack integrates 256 liquid-cooled Vera CPUs, providing the CPU-side environment pool that RL training requires: a dense, energy-efficient compute substrate for the hundreds of concurrent simulation environments that generate training signal for each GPU step.

Jensen Huang previewed Kyber — NVIDIA's next rack architecture after Rubin — featuring 144 GPUs in vertically oriented compute trays designed to reduce latency through shorter inter-GPU signal paths and to increase rack density by eliminating the horizontal cable runs that dominate cooling and space budgets in current NVL72 deployments.

The Kyber preview is worth reading carefully for what it implies about the physical engineering constraints NVIDIA is running into at NVL72 scale. Horizontal GPU trays in current rack configurations require copper NVLink cables of varying lengths to reach all 72 GPU peers — length variation introduces timing skew that the NVLink 6 protocol must absorb through elastic buffers, adding latency. Vertical tray orientation allows all inter-GPU NVLink connections within a tray column to be of near-identical length, eliminating the skew problem and enabling tighter clock margins. The density benefit is substantial: vertical trays can be cooled by a top-to-bottom liquid flow path without requiring horizontal airflow cross-sections, which are the primary constraint on current rack depth. NVIDIA has not committed a specific date for Kyber availability, but the inclusion in the keynote as a named programme signals a manufacturing readiness horizon of late 2027.

Platform GPU HBM Memory BW / GPU GPUs per NVL Rack Availability
Blackwell B200 HBM3e, 192 GB 8 TB/s 72 (NVL72) HVM now
Vera Rubin R100 HBM4, 288 GB 22 TB/s 72 (NVL72) H2 2026
Vera Rubin Ultra (Kyber) R200 (est.) HBM4e (est.) TBD 144 (vertical) 2027

The 100% liquid cooling mandate for Vera Rubin NVL72 is a significant data centre implications story that has received less coverage than the GPU specs. Air-cooled GPU clusters at Blackwell density already push the thermal limits of most hyperscale data hall designs, requiring precision airflow management, hot-aisle containment, and CRAC unit placement that severely constrains rack placement flexibility. The transition to full liquid cooling — rear-door heat exchangers or direct liquid-to-chip cold plates — eliminates the airflow constraint but introduces a new facility requirement: chilled water loops rated for the rack power density of an NVL72, which at peak inference load approaches 80 kW per rack. Microsoft's Fairwater AI superfactory sites, announced at GTC as early adopters of Vera Rubin NVL72, are being designed from the ground up for this power and liquid cooling specification — a facility design that differs fundamentally from retrofitting existing colocation data centres.

Chip Export Controls, the Frore Systems $143M Cooling Raise, and the Thermal Wall in Dense AI Clusters

As governments tighten AI chip access and GPU clusters approach 80 kW per rack, the picks-and-shovels layer — cooling, power, interconnect — emerges as the next investment frontier

Two hardware-adjacent stories from this week's news cycle deserve to be read together, because they both describe constraints on the AI infrastructure buildout that are not captured by GPU benchmark comparisons. The first is the ongoing tightening of U.S. government export controls on AI chips, which this week saw a coalition of tech trade associations back Anthropic in its legal dispute over the Pentagon's move to blacklist the company as a supply-chain risk. The case is structurally important beyond the specific parties: it tests whether the U.S. government can use procurement decisions — rather than export regulations — to reshape AI model behaviour and deployment constraints. The question for the broader hardware ecosystem is whether NVIDIA's export-controlled Blackwell and Vera Rubin product lines to non-approved geographies will face analogous procurement-based restrictions as the U.S. government formalises its AI procurement doctrine.

The second story is the $143 million Series C raise by Frore Systems, a startup developing solid-state active cooling technology for high-power-density chips, at a valuation of approximately $1.64 billion. Frore's AirJet silicon cooling devices use piezoelectric actuators to move air at chip-package level — eliminating the macro-scale fans and heat sinks that currently occupy the majority of server volume in GPU clusters. At first glance this appears adjacent to the liquid cooling trajectory of Vera Rubin NVL72. The more precise reading is that Frore's technology targets the edge of the density curve: edge AI servers, robotics compute modules, and on-device inference accelerators where liquid cooling loops are not available and fan-based solutions are too loud or too large. As NVIDIA's Jetson and AGX Thor platforms push more inference capability to the edge for physical AI and autonomous vehicle applications, the thermal management of chip-scale compute becomes the primary form-factor constraint rather than the compute capability itself.

$143M Frore Systems Series C raise — cooling as AI infrastructure
80 kW Estimated peak power per Vera Rubin NVL72 rack
$1.64B Frore valuation — billion-dollar thermal management market
28 Cities in NVIDIA/Uber Drive AV rollout by 2028

The architectural consequence of the thermal wall for dense AI inference clusters is becoming visible in data centre design decisions. The shift from air to liquid cooling for GPU racks has been discussed for years but is now being executed at scale: Microsoft's Fairwater superfactory design, disclosed at GTC, is a facility designed entirely around liquid-cooled NVL72 deployment — not a retrofitted air-cooled data hall with liquid cooling added as an afterthought. The capital expenditure implications are significant: liquid-cooled data halls require a dedicated chilled water plant, leak detection infrastructure at each rack position, and maintenance procedures for coolant loop connections that add to the operational cost per rack. The payoff is density: liquid cooling can remove heat at 50–100× the volumetric efficiency of air, enabling rack power densities of 100 kW and above that are physically impossible with forced-air cooling at any airflow velocity that keeps fan noise within OSHA limits.

On the autonomous vehicle side, GTC brought confirmation that NVIDIA's Drive AV software platform is being adopted at scale: Uber announced a partnership to launch a fleet powered by Drive AV across 28 cities on four continents by 2028, starting with Los Angeles and San Francisco. Nissan, BYD, Geely, Isuzu, and Hyundai were listed as building Level 4 autonomous vehicles on the Drive Hyperion programme. The hardware implication is a sustained demand signal for AGX Thor compute modules — NVIDIA's automotive-grade SoC — in volumes large enough to matter for the company's non-datacenter revenue line, which has lagged the datacenter division significantly over the past three years. If Uber's 28-city timeline holds, the vehicle compute demand for that fleet would absorb a meaningful fraction of AGX Thor production capacity in 2027–2028.