Memory Mosaics

Abstract: Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent way (“predictive disentanglement”). We illustrate these capabilities on a toy example and also show that memory mosaics perform as well or better than transformers on medium-scale language modeling tasks.

Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan, Beidi Chen and Léon Bottou: Memory Mosaics, The Thirteenth International Conference on Learning Representations, 2025.

iclr-mosaics-2025.djvu iclr-mosaics-2025.pdf iclr-mosaics-2025.ps.gz

@inproceedings{zhang-2025,
  title = {Memory Mosaics},
  author = {Zhang, Jianyu and Nolte, Niklas and Sadhukhan, Ranajoy and Chen, Beidi and Bottou, L\'{e}on},
  booktitle = {The Thirteenth International Conference on Learning Representations},
  year = {2025},
  url = {http://leon.bottou.org/papers/zhang-2025},
}

The following paper validates the design in a 10B model trained on 1T tokens.

Jianyu Zhang and Léon Bottou: Memory Mosaics at Scale, Advances in Neural Information Processing Systems, 38, Curran Associates, Inc., 2025.

more...