IBM’s new chip architecture points to faster, more energy-efficient AI

News | October 24, 2023

By Wisse Hettinga

AI IBM NORTHPOLE research

Cette publication existe aussi en Français

A new chip prototype (NorthPole) from IBM Research’s lab in California, long in the making, has the potential to upend how and where AI is used efficiently

NorthPole is a breakthrough in chip architecture that delivers massive improvements in energy, space, and time efficiencies, according to Modha. Using the ResNet-50 model as a benchmark, NorthPole is considerably more efficient than common 12-nm GPUs and 14-nm CPUs. (NorthPole itself is built on 12 nm node processing technology.) In both cases, NorthPole is 25 times more energy efficient, when it comes to the number of frames interpreted per joule of power required. NorthPole also outperformed in latency, as well as space required to compute, in terms of frames interpreted per second per billion transistors required. According to Modha, on ResNet-50, NorthPole outperforms all major prevalent architectures — even those that use more advanced technology processes, such as a GPU implemented using a 4 nm process.

How does it manage to compute with so much efficiency than existing chips? One of the biggest differences with NorthPole is that all of the memory for the device is on the chip itself, rather than connected separately. Without that von Neumann bottleneck, the chip can carry out AI inferencing considerably faster than other chips already on the market. NorthPole was fabricated with a 12-nm node process, and contains 22 billion transistors in 800 square millimeters. It has 256 cores and can perform 2,048 operations per core per cycle at 8-bit precision, with potential to double and quadruple the number of operations with 4-bit and 2-bit precision, respectively.

But the biggest advantage of NorthPole is also a constraint: it can only easily pull from the memory it has onboard. All of the speedups that are possible on the chip would be undercut if it had to access information from another place. Via an approach called scale-out, NorthPole can actually support larger neural networks by breaking them down into smaller sub-networks that fit within NorthPole’s model memory, and connecting these sub-networks together on multiple NorthPole chips. So while there is ample memory on a NorthPole (or collectively on a set of NorthPoles) for many of the models that would be useful for specific applications, this chip is not meant to be a jack of all trades.

Learn more