If it's a coincidence that Nvidia's Grace CPU super chip is CPU+GPU architecture, then it's hard to call it a coincidence that Intel and AMD are doing the same thing with their Falcon Shores XPU and Instinct MI300 chips.
Coincidentally, all three of these chips are used in data center scenarios, which means that IN the next two years, AMD, Nvidia, and Intel will all have hybrid CPU+GPU chips entering the data center market.
It can be said that the form of CPU+GPU has become the trend of future chip design.
Further integration of CPU and GPU
Intel launches XPU
Intel announced a special fusion processor called Falcon Shores, officially known as the XPU. At its heart is a new processor architecture that packs Intel's x86 CPU and Xe GPU hardware into the same Xeon chip.
Falcon Shores is a tile-based chip that is highly scalable and flexible to better meet the needs of HPC and AI applications.
According to Intel's figures, Falcon Shores' power consumption ratio increased by more than five times, x86 computing density increased by more than five times, and memory capacity and density increased by more than five times compared to today's levels.
The Falcon Shores chip will be available in 2024.
AMD launched the APU
AMD is also showing its ambitions in data centres.
APU is AMD's "accelerated processing unit" nomenclature for client cpus traditionally used to integrate graphics cards. AMD has been dreaming of using APUs since the heyday of Opteron cpus in 2006, and began shipping its first APUS for PCS in 2010. It was followed by a series of custom APU consoles in SONY'S Play Station4 and 5 and Microsoft's Xbox XS, as well as some Opteron APUs -- the X2100 in 2013 and the X3000 in 2017.
Recently, AMD unveiled a roadmap for the 2023 Instinct MI300 chip, AMD's first exascale APU, which it calls "the world's first data center APU."
The APU is a chip that combines a CPU and GPU core in one package, specifically a ZEN4-based Epyc CPU with a GPU using its new CDNA3 architecture.
AMD says the Instinct MI300 is expected to deliver more than an eight-fold improvement in AI training performance over its Instinct MI250X, compared to the CDNA2 GPU architecture that supports the Instinct MI200 family, The CDNA3 architecture for the Instinct MI300 will provide more than a five-fold performance power ratio improvement for AI workloads.
The Instinct MI300 is due out in 2023.
Nvidia Grace Super Chip
Nvidia, which has long focused on GPU design, caused a stir last year when it announced a move into ARM-based cpus. In March, Nvidia unveiled its Grace Hopper super chip for HPC and large-scale AI applications. The chip combines an NVIDIA Hopper GPU with a Grace CPU in an integrated module via NVLink-C2C.
CPU+GPU Grace Hopper core number halved, LPDDR5X memory is only 512GB, but the graphics card 80GBHBM3 memory, the total bandwidth up to 3.5TB/s, at the cost of 1000W power consumption, each rack can accommodate 42 nodes.
Nvidia has also promised to launch its superchip in the first half of 2023.
Why are the big guys using this format?
In terms of timing, Intel's Falcon Shores chip is in 2024, AMD Instinct MI300 is in 2023, and Nvidia's Grace Hopper Super chip is in the first half of 2023.
Why has the form of CPU+GPU attracted the interest of the three major companies, which have arranged it in the data center one after another?
First of all, in the era of digital economy, computing power is becoming a new productive force, widely integrated into all aspects of social production and life. Data center is the physical carrier of computing power and the key infrastructure for digital development. In 2021, the global data center market exceeded us $67.9 billion, up 9.8% from 2020. As a result, data centres, which are huge markets, have long been on the tech giants' radar.
Secondly, the data center will collect a large amount of data, so the chips built in the data center need to have great computing power. The combination of CPU and GPU can improve the computing power. Raja Koduri, Senior Vice President and General Manager of The Accelerated Computing Systems and Graphics (AXG) Group at Intel, said that to succeed in the HPC market, chips need to be able to handle large data sets. Although GPU has powerful computing capability and can work hundreds of cores in parallel at the same time, today's independent GPU still has a major defect, that is, large data sets cannot be easily put into the independent GPU memory, and it takes time to wait for the slow refresh of video memory data.
In particular, memory problems can be improved by putting the CPU and GPU into the same architecture, eliminating redundant memory duplicates and eliminating the need for the processor to copy data to its own dedicated memory pool to access/change the data. The unified memory pool also means that there is no need for a second memory chip pool, the DRAM connected to the CPU. For example, the Instinct MI300 will combine a SMALL CDNA3 GPU chip and a small Zen4 CPU chip into a single processor package, and the two processor pools will share encapsulated HBM memory.
Using NVLink-C2C interconnect, Grace cpus can transfer data to Hopper Gpus 15 times faster than traditional cpus, according to Nvidia. But for scenarios with very large data sets, even with high-speed interfaces like NVLink and AMD's Infinity Fabric, the latency and bandwidth costs of exchanging data between the CPU and GPU are still quite high due to the speed at which hPC-level processors manipulate data. So if you can shorten the physical distance of this link as much as possible, you can save a lot of energy and improve performance.
AMD says the architecture's design will allow apUs to use lower power consumption than implementations that use discrete cpus and Gpus; Intel also says its Falcon Shores chip will significantly improve bandwidth, performance per watt, computing density and memory capacity.
The appeal of customization
Integrating multiple independent components tends to bring a lot of long-term benefits, but it's not just a matter of integrating CPU and GPU into a single chip. Intel, Nvidia, and AMD all choose Chiplet mode for their GPU+ cpus.
Traditionally, in order to develop complex IC products, vendors have designed a chip that integrates all functions on the same chip. In each subsequent generation, the number of features on each chip has increased dramatically. On the latest 7nm and 5nm nodes, costs and complexity have soared.
And use Chiplet design, will have different functions and the modularization of process node or small chip encapsulation on the same chip, chip customer can choose any one of these little chip, and assemble them in an advanced encapsulation, producing a new, complex, chip design, as a substitute for on a chip (SoC) system.
Because of the nature of small chips, the three giants have developed their own multi-chip interconnections while also offering customised services.
The architecture will use the Chiplet approach, in which multiple chips and different processor modules made using different manufacturing processes can be tightly packed into a single chip package, Intel said at the launch of Falcon Shores. This allows Intel to do a higher level of customization on the cpus, Gpus, I/O, memory types, power management and other circuit types that it can put into its chips.
Most specifically, Falcon Shores can deploy different block modules on demand, especially x86CPU cores and XeGPU cores, in flexible numbers and ratios, depending on what needs to be used.
Currently, Intel has opened up its x86 architecture for licensing and has developed a Chiplet policy that allows customers to put Arm and RISC-V cores in one package.
More recently, AMD has also opened the door to customization. "We're focused on making chips easier and more flexible to implement," AMD chief Technology Officer Mark Papermaster said at an analyst day conference.
AMD allows customers to implement multiple cores (also known as chiplet or Compute Tiles) in a compact chip package. AMD already uses Tiles, but now AMD allows third parties to make accelerators or other chips to include them alongside x86 cpus and Gpus in their 2D or 3D packages.
AMD's custom chip strategy will revolve around the new Infinity Architecture 4.0, which is the interconnect of core cores in chip packaging. The proprietary Infinity architecture will be compatible with the CXL 2.0 interconnect.
Infinity Interconnect will also support UCIe (Universal Chiplet Interconnect Express) to connect the Chiplet in the encapsulation. UCIe is already supported by Intel, AMD, Arm, Google, Meta, and others.
Will the next top chip be a multi-chip design?
Overall, AMD's server GPU trajectory is very similar to Intel's and Nvidia's. All three companies are moving toward a COMBINATION of CPU and GPU products, nvidia's GraceHopper(Grace+H100), Intel's Falcon Shores XPU (mixing and matching CPU+GPU), and now the MI300 uses both CPU and GPU small chips in a single package. In all three cases, these technologies aim to combine the best cpus and the best Gpus for workloads that are not entirely constrained by either.
Akshara Bassi, a Research analyst at Counterpoint Research, said: "As chip area becomes larger and wafer yield becomes more important, multi-chip modular package designs can achieve better power consumption and performance than single-chip designs."
Chiplet will survive, but for now, the field is an island. AMD, Apple, Intel and Nvidia are applying their own interconnect designs to specific packaging technologies.
In 2018, Intel upgraded EMIB (Embedded Multi-Chip) technology to logic wafer 3D stacking. In 2019, Intel introduced Co-EMib technology, capable of interconnecting two or more Foveros chips.
AMD pioneered the Chiplet model and gained a technological advantage by fully adopting small chip technology in 2019. Lisa Su expressed future plans in her presentation, "We are working closely with TSMC on their 3D architecture, combining small chip packaging with chip stacking to create 3D small chip architectures for future HPC products."
On March 2 this year, ten big companies including Intel, AMD, Arm, Qualcomm, TSMC, Samsung, ASE, Google Cloud, Meta, and Microsoft announced the establishment of the Chiplet Standards Consortium and launched the Universal Small Chip Interconnection Standard (UCIe) in hopes of aggregating the industry.
To date, only a handful of chip giants have developed and manufactured designs based on Chiplet. Due to the rising cost of advanced node development chips, the industry needs Chiplet more than ever. In the multi-chip trend, the next top chip will inevitably be multi-chip design.