===== Memory Mosaics =====


//Abstract:// Memory Mosaics are networks of associative memories working in concert to
achieve a prediction task of interest. Like transformers, memory mosaics possess
compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent way
(“predictive disentanglement”). We illustrate these capabilities on a toy example
and also show that memory mosaics perform as well or better than transformers
on medium-scale language modeling tasks.

{{ mosaic-steamroller.png?400 }}


<box 99% orange>
Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan, Beidi Chen and Léon Bottou:  **Memory Mosaics**,  //The Thirteenth International Conference on Learning Representations//, 2025.

[[http://leon.bottou.org/publications/djvu/iclr-mosaics-2025.djvu|iclr-mosaics-2025.djvu]]
[[http://leon.bottou.org/publications/pdf/iclr-mosaics-2025.pdf|iclr-mosaics-2025.pdf]]
[[http://leon.bottou.org/publications/psgz/iclr-mosaics-2025.ps.gz|iclr-mosaics-2025.ps.gz]]
</box>

  @inproceedings{zhang-2025,
    title = {Memory Mosaics},
    author = {Zhang, Jianyu and Nolte, Niklas and Sadhukhan, Ranajoy and Chen, Beidi and Bottou, L\'{e}on},
    booktitle = {The Thirteenth International Conference on Learning Representations},
    year = {2025},
    url = {http://leon.bottou.org/papers/zhang-2025},
  }
  
==== Related ====

The following paper validates the design in a 10B model trained on 1T tokens.

<box 99% orange>
Jianyu Zhang and Léon Bottou:  **Memory Mosaics at Scale**,  //Advances in Neural Information Processing Systems//, 38, Curran Associates, Inc., 2025.

[[papers/zhang-bottou-2025|more...]]
</box>