Can Remixing Memory Curb AI’s Energy Problem?

TSMC’s former R&D chief wants designers to combine different memory technologies

3 min read

Katherine Bourzac is a freelance journalist based in San Francisco, Calif.

An illustration of a stack of layers of different thicknesses connected by thin vertical lines.

The N3XT 3D chip includes layers of different memory types, such as spin-transfer-torque magnetic RAM and metal oxide resistive switching RAM.

The future of memory is massive, diverse, and tightly integrated with processing. That was the message of an invited talk this week at the International Electron Devices Meeting in San Francisco. H.-S. Philip Wong, an electrical engineer at Stanford University and former vice president of research at TSMC who remains a scientific advisor to the company, told the assembled engineers it’s time to think about memory and computing architecture in new ways.

The demands of today’s computing problems—particularly AI—are rapidly outpacing the capabilities of existing memory systems and architectures. Today’s software assumes it’s possible to randomly access any given bit of memory. Moving all that data back and forth from memory to processor consumes time and tremendous amounts of energy. And all the data is not created equal. Some of it needs to be accessed frequently, some infrequently. This “memory wall” problem calls for a reckoning, Wong argued.

Unfortunately, Wong said, engineers continue to focus on the question of which emerging technology will replace the conventional memory of today, with its hierarchy of SRAM, DRAM, and Flash. Each of these technologies—and their potential replacements—has tradeoffs. Some provide faster readout of data; others provide faster writing. Some are more durable; others have lower power requirements. Wong argued that looking for one new memory technology to rule them all is the wrong approach.

“You cannot find the perfect memory,” Wong said.

Instead, engineers should look at the requirements of their system, then pick and choose which combination of components will work best. Wong argued for starting from the demands of specific types of software use cases. Picking the right combination of memory technologies is “a multidimensional optimization problem,” he said.

One Memory Can’t Fit All Needs

Trying to make one kind of memory the best at everything—fast reads, fast writes, reliability, retention time, density, energy efficiency, and so forth—leaves engineers “working too hard for no good reason,” he said, showing a slide with an image of electrical-engineer-as-Sisyphus, pushing a gear up a hill. “We’re looking at numbers in isolation without knowing what a technology is going to be used for. Those numbers may not even matter,” he said.

For instance, when designing a system that will need to do frequent, predictable reads of the memory and infrequent writes, MRAM, PCM, and RRAM are good choices. For a system that will be streaming a high volume of data, the system needs frequent writes, few reads, and only requires a short data lifetime. So engineers can trade off retention for speed, density, and energy consumption and opt for gain cells or FeRAM.

With talk of starting up nuclear power plants to fuel AI data centers, it’s clear the industry is aware of its energy problem.

This kind of flexibility can lead to great benefits. As an example, Wong points to his work on hybrid gain cells, which are similar to DRAM but use two transistors in each memory cell instead of a transistor and a capacitor. One transistor is silicon and provides fast readout; the other stores the data without needing refreshing, and is based on an oxide semiconductor. When these hybrid gain cells are combined with RRAM for AI/machine learning training and inference, they provide a 9x energy use benefit compared with a traditional memory system.

And crucially, said Wong, these diverse memories need to be more closely integrated with computing. He argued for integrating multiple chips, each with their own local memories, in an “illusion system” that treats each of these chips as part of one larger system. In a paper published in October, Wong and his collaborators propose a computing system that uses a silicon CMOS chip as its base, layered with fast-access dense memory made up of STT-MRAM, layers of metal oxide RRAM for non-volatile memory, and layers of high speed, high density gain cells. They call this a MOSAIC (MOnolithic, Stacked, and Assembled IC). To save energy, data can be stored near where it will be processed, and chips in the stack can be turned off when they’re not needed.

During the question session, an engineer in the audience said that he loved the idea but noted that all these different pieces are made by different companies that don’t work together. Wong replied that this is why conferences like IEDM that bring everyone together are important. With talk of starting up nuclear power plants to fuel AI data centers, it’s clear the industry is aware of its energy problem.“Necessity is the mother of invention,” he added.
The Conversation (0)
  翻译: