In-memory computing for machine learning inference and training

A large domain of interest for non-von Neumann computing architectures is to support deep learning and other modern machine learning techniques. Artificial neural networks are inspired by neuro-anatomical observations, with data flowing between layers of neurons and computations distributed throughout. This data flow has practically no resemblance to the CPU layout that follows a von Neumann design and leads to tremendous inefficiency when a CPU tries to run the computations needed in neural networks. GPUs offer a great improvement here and are the gold standard today. But there is significant room for improvement, and future machine learning that adopts even more biological inspiration will only make this gap larger.

Our team and collaborators have spent many years exploring new architectures better matched to neural network inference and training. Some key insights are that brains operate through intertwined compute/memory operations, and furthermore that high precision calculations are not required, but are unnecessarily costly. The vast majority of computations performed in artificial neural networks are matrix operations (linear algebra). And the majority of energy and time expended for these computations comes from fetching and moving the required data (synaptic weights and activations) across a chip. Both of these are addressed with the use of non-volatile analog memristors that can store synaptic weight values and allow for performing matrix operations within the memory itself: in-memory computing. A circuit layout that allows this is a simple crossbar geometry. With weights stored in the memristor crossbar cells as conductance, when the input vector is applied to row lines, the matrix-vector multiplication out is generated as the current in the column lines.

pgi14_research2a_jpg.poster.png

Using this basic circuit, and supplementing with many digital function blocks, larger architectural designs are constructed that can support any modern deep learning network, from Convolutional Neural Networks (CNN) to Long Short-Term Memory (LSTM) and Restricted Boltzmann Machines (RBM). Our work began with a design we called “ISAAC” (In-situ Analog Arithmetic in Crossbars), published in 2016 International Symposium on Computer Architecture (ISCA). The work has since evolved in different directions, either increasing performance with more optimizations, or increasing breadth by adding more flexible design and functionality to support more network types, and even support for the harder problem of neural network training (PANTHER).

pgi14_research2b_jpg.poster.png

Probabilistic computing is another architecture in the domain of non-von Neumann computing. First, the stochasticity in biological neurons have been widely considered to be responsible for the high efficiency in learning and inference. On the other hand, many real-world tasks show high demand on processing noisy data and perform inference along with the information of confidence level. In this regard, we can also emulate such more bio-plausible and stochastic processing architecture to boost AI devices with in-memory computingvia injecting noise into neurons or via implementing noisy behaviors in synapses.In such integration, the physical noise can be exploited as a resource for computation in neural network and machine learning that real-world applications such as healthcare and object-tracking with Bayesian inference and uncertainty-aware deep learning on devices could all be benefited from this solution.

Some further reading:

Last updated: 16.8.2021

Last Modified: 25.08.2022