Analog in-memory computing attention mechanism for fast and energy-efficient large language models

Nature Computational Science Published on 08 September 2025

Large language models (LLMs) are increasingly embedded in daily applications, but their growing energy footprint is a critical challenge. While much of the hardware research has focused on MLP layers that store model parameters, a major bottleneck lies in the attention mechanism, which requires frequent updates to the KV cache and therefore demands fast, energy-efficient, and writable memory.

Analog in-memory computing attention mechanism for fast and energy-efficient large language models

In this work, we present an analog in-memory computing attention architecture based on gain-cell memories. These devices are CMOS-compatible, easy to write, and well suited for the repeated updates required by attention. Although the technology is not yet fully mature, it represents a promising path forward.

Autors: Nathan Leroux, Paul-Philipp Manea, Chirag Sudarshan, Jan Finkbeiner, Sebastian Siegel, John Paul Strachan & Emre Neftci
https://doi.org/10.1038/s43588-025-00854-1

Contact

  • Peter Grünberg Institute (PGI)
  • Neuromorphic Compute Nodes (PGI-14)
Building TZA-Aachen /
Room C0.11
+49 241/92-780421
E-Mail

Dr. Nathan Leroux

Postdoctoral Researcher

  • Peter Grünberg Institute (PGI)
  • Neuromorphic Software Ecosystems (PGI-15)
Building TZA-Aachen Aachen
+49 241/92-780921
E-Mail

Last Modified: 19.09.2025