Edge 425: Inside Mamba, the Most Famous SSM Model

In this issue:

An introduction to the Mamba SSM paper.
A review of the original Mamba paper by Princeton and Carnegie Mellon University.
An overview of the GridTape framework for building LLM apps.

ML Concept of the Day: Diving Into Mamba

When come to State Space Models(SSMs) no architecture has achieved more notoriety than Mamba. The famous was introduced created by researchers from Carnegie Mello and Princeton University in a recent paper and is capable of achieving performance comparable to the Transformer model while managing extensive sequence lengths, such as one million tokens. This breakthrough is possible because Mamba eliminates the “quadratic bottleneck” found in the Attention Mechanism. Additionally, Mamba operates with impressive speed, reportedly up to five times faster than the Transformer model.

To put this in context, let’s show how Mamba improves two of the essential functions of foundation models which are communication between tokens and computation within a token. In Transformers, these roles are handled by the Attention mechanism (for communication) and Multilayer Perceptrons (MLPs) (for computation). Improvements to Transformer models typically focus on optimizing these two functions.