Analog in-memory computing attention mechanism for fast and energy-efficient large language models
Nature Computational Science Published on 08 September 2025
Large language models (LLMs) are increasingly embedded in daily applications, but their growing energy footprint is a critical challenge. While much of the hardware research has focused on MLP layers that store model parameters, a major bottleneck lies in the attention mechanism, which requires frequent updates to the KV cache and therefore demands fast, energy-efficient, and writable memory.

In this work, we present an analog in-memory computing attention architecture based on gain-cell memories. These devices are CMOS-compatible, easy to write, and well suited for the repeated updates required by attention. Although the technology is not yet fully mature, it represents a promising path forward.
Autors: Nathan Leroux, Paul-Philipp Manea, Chirag Sudarshan, Jan Finkbeiner, Sebastian Siegel, John Paul Strachan & Emre Neftci
https://doi.org/10.1038/s43588-025-00854-1
Contact
- Peter Grünberg Institute (PGI)
- Neuromorphic Compute Nodes (PGI-14)
Room C0.11
Dr. Nathan Leroux
Postdoctoral Researcher
- Peter Grünberg Institute (PGI)
- Neuromorphic Software Ecosystems (PGI-15)