//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
Nvidia launched the concept of superchips at its GTC conference in March. “Superchip” is what the company calls its modules with two computing die on it; the Grace Superchip has two Grace CPUs, and the Grace Hopper superchip has one Grace CPU and one Hopper GPU.
Grace Hopper features an NVLink–C2C 900 GB/s connection between the Grace CPU and Hopper GPU, effectively extending Hopper’s memory to 600 GB (Hopper alone has 80 GB). This is crucial for AI acceleration since AI models are rapidly increasing in size; keeping the entire model on one GPU makes for faster latency during inference (latency is particularly critical for hyperscalers running real–time NLP and recommendation models). This represents 15x traditional CPU data transfer rates, according to Nvidia.
Grace Hopper is already getting traction in supercomputers, including ALPS in Switzerland.
“The reason it’s interesting for [HPC] is energy efficiency is a very important figure right now,” Ian Buck, vice president of Hyperscale and HPC at Nvidia, told EE Times. “Demand for compute isn’t slowing down. We can build supercomputers that are faster, better and consume less power to replace previous systems that might be less performant… you can actually reduce the energy footprint of computing by moving to more performant supercomputing architectures like Grace Hopper.”
As well as decreasing time to solution, another way to reduce energy consumption is by reducing the computational needs of some parts of supercomputing workloads.
“Traditional simulation isn’t going anywhere — we will continue to simulate climate science, weather, molecular dynamics, and proteins with first principles physics — but if we can augment some types of simulations with AI, we can speed them up so they can do the work they need to do with many fewer clock cycles and in much less time,” Buck said. The overall effect is to use less energy.
The Grace superchip features a combined 144 Arm CPU cores with close to 1 TB/s combined memory bandwidth, with the combination achieving a SPECint rate of 740 (for GCC compiler benchmark).
“Grace allows us to build a CPU that was designed for AI infrastructure,” Buck said, adding that Grace uses a standard Arm v9 core from an upcoming Arm product range, with the standard instruction set. “[Grace is about] taking a standard Arm core and building the best possible chip that can be made [to complement] our GPUs for AI workflows.”
Each Grace CPU sits alongside 16x specially made LPDDR5X memory chiplets (8x on the front, 8x on the back) which includes data resiliency and ECC features to make it suitable for the data center rather than its more typical mobile or edge device application. This is tightly coupled with the CPU to provide a huge 500 GB/s memory bandwidth for each Grace.
LPDDR (the LP stands for “low power”) offers much better performance per Watt than standard DDR. This and the custom form factor contribute to making Grace a compact, efficient CPU, Buck said, adding that Grace’s performance per Watt is around double that of other CPUs on the market today.
Far from merely feeding one or more Hopper GPUs, the Grace superchip will be used as an accelerator in its own right for scientific workloads. Acceleration features include Arm’s scalable vector extension, which supports a vector–level agnostic (VLA) programming model which can adapt to the vector length. VLA means the same program can run without being recompiled or rewritten if longer vectors need to be used further down the line.
“This is an ultimate CPU capability for compute–rich CPU workloads, there’s definitely interest in that space,” Buck said. “In the accelerated computing work we’ve done up to this point, we focused on the applications where the majority of the compute cycles are spent. Hot areas are molecular dynamics, some physics work, energy, and there is a long tail of HPC applications which haven’t gotten around to being ported to GPUs.”
There are two main reasons why code wouldn’t already be ported to GPUs, Buck explained.
“There is a long tail of applications that are written in Fortran, that can’t be modified because they’ve been certified for a particular use case or workflow, and rewriting them would change their functionality in a way that would need recertification,” he said. “These are still very important workloads that still need to be supported and still need better CPUs.”
The other reason is that ensemble code may be used for things such as climate simulation, where there may be hundreds of smaller mathematical models. Individually, they may not require much compute, but there are a lot of them, so porting them all would take a long time.
“We can accelerate climate simulation by not only giving them Hopper, which will be great at the GPU–accelerated portions, but also Grace, which will help accelerate the rest of the code that is being used in a global climate model which is trying to simulate literally everything that the Earth is experiencing, from solar radiation to cloud formation, to ocean currents, to forestry, to how the rainforests breathe… there’s a huge list of simulations that are running in parallel.”
As Buck points out, while some smaller models don’t run very long, Amdahl’s law requires that those should also be accelerated to achieve overall speedup. “That’s what Grace will help do,” he said.
The new superchips will also allow for different configurations of homogeneous or heterogeneous compute.
“We’re going into a really interesting space where traditionally we’ve [used] one CPU chip to four GPU chips, and that’s because we focused our value on GPU workloads,” he said. “There may have been a CPU to manage that, but maybe there’s a separate CPU cluster to do the CPU workloads.”
“Grace Hopper will be an interesting experience, because now you have a one–to–one ratio, so you could potentially build a supercomputer that is great at both CPU and GPU workloads, all in one,” he said. “We think that is pretty valuable and it’s interesting to see how that will play out. We also have the Grace CPU servers as well, so people can still do heterogeneous configurations if they want to break up the workloads that way.”
Server makers are responding to interest in the HPC market for the performance superchips can offer.
At Computex this week, server makers Supermicro, Gigabyte, Asus, Foxconn, QCT, and Wiwynn unveiled plans to make servers with Nvidia superchips. For example, Supermicro said it will initially deploy a limited number of Grace superchip servers, starting with a 2U 2–node selection, with more configurations to follow. Supermicro is marketing these servers for digital twins, AI, HPC, cloud graphics, and gaming workloads.
All the upcoming servers will be based on four new 2U Nvidia designs based on one–, two– and four–way configurations for different use cases. Currently, this includes designs with Grace Hopper for AI/HPC, designs with Grace superchip for HPC, and Grace superchip plus GPU designs which will be used for digital twins, collaboration, cloud graphics, and gaming.
The first servers with Grace superchips and Grace Hopper should be available in the first half of next year.