RTX Spark turns smartphone-style silicon into Windows AI

NVIDIA’s RTX Spark is heading to premium Windows laptops and workstations later this year, pairing an Arm-based Grace Blackwell superchip with a Blackwell GPU and a rare 128GB unified memory pool. It’s designed for serious on-device AI—moving beyond the “promi
A modern Windows AI pitch has been hard to cash in. For years. owners of Snapdragon X-powered PCs have been met with impressive day-to-day performance and strong battery life—followed by a blunt limitation: advanced on-device models are basically out of reach when you only have 16GB of RAM and no truly viable accelerator.
NVIDIA’s RTX Spark is built to change that story, and it does so by borrowing the logic behind smartphone chips—then scaling it for server-like workloads.
The RTX Spark platform is set to debut in a wave of premium Windows laptops later this year. Early designs have already been announced across Microsoft Surface, ASUS, Dell, HP, Lenovo, and MSI. The product lineup is broad: thin-and-light 14-inch creator laptops, larger 16-inch workstations, and even mini-desktop PCs.
At the center is a unified-memory approach and a Blackwell GPU technology NVIDIA has positioned as an AI-focused engine, not a gaming afterthought. The standout number is the one most people won’t be used to seeing on a laptop: 128GB of unified system memory.
That kind of memory is where the platform’s promise gets concrete. NVIDIA says its 128GB unified memory is sufficient to hold a 120-billion-parameter AI model. It also compares that capacity to other known model sizes: GPT-OSS 120B is around 80GB. and NVIDIA Nemotron 3 Super is 83GB. By contrast. it notes that Google’s on-device mobile AI models fit in less than 4GB of RAM—an illustration of the gap between “pocketable” models and server-tier inference.
Under the hood, RTX Spark leans on a superchip NVIDIA calls the N1X, also known as the GB10 Grace Blackwell Superchip. This is the same GB10 that already powers the $4,700 DGX Spark system, which runs NVIDIA’s DGX Linux OS rather than Windows.
The CPU side of that GB10 uses a modern Armv9 design—the same architecture class found in high-end phone chipsets. NVIDIA’s N1X implementation uses 10 Arm Cortex-X925 cores and 10 A725 cores, for 20 total CPU cores. The Cortex-X925 launched in 2024. and it was found in last year’s MediaTek Dimensity 9400 for smartphones. though in a single-big-core configuration.
NVIDIA also says MediaTek helped design the CPU inside RTX Spark. which is part of why the chip carries familiar building blocks from the mobile world. RTX Spark pushes those cores at higher clocks than typical smartphone configurations: the X925 runs at 4.0GHz and the A725 runs at 2.85GHz. NVIDIA also describes a cache setup said to reach up to 2MB L2 for the X925 and 512KB L2 for the A725. paired with 16MB L3 and 16MB system cache.
The CPU horsepower matters—but the real differentiator is how the CPU and GPU share memory.
To tie the two together, RTX Spark uses an NVLink-C2C interconnect. NVIDIA’s claim is that the memory link delivers up to 600GB/s of bidirectional bandwidth between the CPU and GPU. The point is to allow both to share a unified address space with virtually no overhead.
NVIDIA also frames NVLink-C2C as a practical alternative to splitting large AI workloads between system memory and GPU memory over a traditional bus. It says the interconnect is roughly 5x faster than PCIe Gen5’s bidirectional bandwidth. which it calls a potential bottleneck if big models must be divided.
The memory choice is another tradeoff—and it’s one that suggests the priority is inference, not peak graphics performance. RTX Spark uses LPDDR5X RAM, which NVIDIA notes has an effective memory bandwidth of 273GB/s. That’s much lower than the 768GB/s or so found on graphics cards with dedicated GDDR6/7 memory.
So if you’re expecting RTX Spark to look like a desktop gaming rig, the design doesn’t quite line up that way. Still, NVLink-C2C enables the CPU and GPU to share the 128GB package-level LPDDR5X memory pool for apps, graphics, and AI workloads that demand extreme memory performance.
That brings the conversation back to what people actually need when they try to run big models locally: enough memory for the model weights, plus enough bandwidth to move data quickly enough to make it usable.
The processing engine for those workloads is an integrated Blackwell GPU. RTX Spark uses the same Blackwell GPU architecture that powers NVIDIA’s 5000-series gaming GPUs. NVIDIA says the GPU in RTX Spark includes 6,144 CUDA cores, matching the GeForce RTX 5070 on paper.
But there are constraints. Lower memory bandwidth and a tighter power envelope mean gaming performance is expected to fall well short of a desktop RTX 5070. Even so. NVIDIA says the integrated GPU supports DLSS 4.5. Reflex. and hardware ray tracing—capabilities aimed at bringing many of the desktop gaming features into a laptop-sized form.
The bigger pitch is the AI stack. NVIDIA is aiming to bring the CUDA and TensorRT AI ecosystem into “everyday” devices. It claims up to 1 petaflop of FP4 AI performance, targeting the ability to run large quantized models directly from the 128GB unified memory on the RTX Spark’s CUDA cores.
For models that exceed conventional GPU memory limits, RTX Spark’s 128GB unified memory is positioned as the more practical approach than relying on a faster GPU paired with only 16GB or 32GB of VRAM.
The direction also echoes what Apple has done with Apple Silicon: large unified memory. Arm CPUs. and a tightly integrated system. RTX Spark takes that same convergence—smartphone-style Arm design and shared memory thinking—then adds NVIDIA’s Blackwell GPU. CUDA acceleration. and an unusually large memory pool targeted at local AI inference and server-tier workloads.
Whether this pivot becomes a hit will come down to one thing nobody can ignore: price. NVIDIA’s existing DGX Linux desktop version suggests costs will be very high. The first wave of laptops launching this fall could land in the premium range. especially in a market where many Windows buyers are still dealing with RAM-limited systems.
But for the smaller group of Windows users who want to run their own powerhouse AI workloads—without waiting for cloud access—RTX Spark is an attempt to finally make the idea workable on the device itself.
NVIDIA RTX Spark Windows laptops unified memory Blackwell GPU Grace superchip Armv9 on-device AI NVLink-C2C CUDA TensorRT
So… is this gonna make my laptop stop crashing or what?
128GB unified memory sounds insane but I don’t even know if Windows will actually use it. Also I thought Arm PCs were limited already so how is this different besides marketing?
Wait you’re saying it’s like smartphone silicon, but it’s “server-like”? That’s literally what every chip company says lol. I just hope it doesn’t mean everything runs hotter and the battery life dies, because my last ARM laptop was like 6 hours max.
Not gonna lie, this feels like NVIDIA trying to win the AI laptop race again. The part about moving past 16GB RAM makes me think they’re basically selling overpriced RAM upgrades disguised as AI. And if it’s debuting on Surface/ASUS/Dell/HP/Lenovo/MSI then it’s already everywhere so I’m confused why this is “new” late this year.