Locating and Editing Factual Associations in Mamba

Architecture of a MambaBlock. W_a and W_g are up-projectors 2d × d matrices, while W_o is a down projection matrix of size d × 2d.

What does Activation Patching Reveal?

To understand where Mamba stores factual associations, we apply activation patching (a). To estimate the contribution of different module-token states towards a correct factual prediction (s = Michael Jordan, r = professionally played, o = basketball) , we run the model 3 times --

Clean Run, G: Run the model on a prompt to extract the fact we are interested in x = (s, r) = Michael Jordan professionally played
Corrupted Run, G^*: Run the model with a different s' that changes the model output to o' (o' != o)
Patched Run, G^*[← h^l_i ] In the corrupted run G^*, state h^l_i is restored by patching its corresponding state from the clean run G.

In the patched run, we monitor to what extent the clean state h^l_i recovers the original prediction o. And, we call this recovery the indirect effect (IE) of the state h^l_i on the prediction o is recalling the fact (s, r, o). See and rome.baulab.info for details on this.

When activation patching is applied on the residual states (b), we observe 2 distinct regions of high IE.

early site: The early middle layers show high IE at the subject last token
late site: The late layers show high IE at the very last token of the prompt.

The high IE at the late site is natural as these LMs are autoregressive restroring a late state would mean restoring most of the model computation. However, the early site is surprising and suggests that the LM stores the fact in these states. This finding is consistent with what Meng et al (2022) observed in GPT models.

We can apply activation patching to all the different types of states in MambaBlock, the Conv + SSM output s_i, the gating output g_i, and the MambaBlock output o_i.

Can ROME Edit Facts in Mamba?

ROME, or Rank-One Model Editing was designed to edit/insert facts in GPT (or a autoregressive transformer LM). ROME views the down projection matrix W_down of GPT as a associatetive memory mapping specific keys to specific values. And, ROME inserts a new fact (a new key-value pair) by directly adding a rank-one matrix to W_down. Checkout rome.baulab.info for details on this.

ROME performance by modifying different projection matrices W_a, W_g, and W_o in MambaBlock.

We apply ROME to all the 3 projection matrices in MambaBlocks across different layers and plot its performance in changing different facts. We use the same evaluation suite used by Meng et al (2022). Efficacy (ES) score indicate if we can make the LM say say the new fact. But, knowing a fact differs from making the LM say it. To test for that ROME uses Generalization (PS) score indicates if the edited knowledge is robust to changes in wordings and context. And, Specificity (NS) ensures that the edit does not change unrelated facts; i.e. if after inserting that Michael Jordan played soccar, the LM should not map some other athlete (say Lebron James) to soccar as well. The final score S is a harmonic mean of ES, PS, and NS.

We find that ROME can successfully edit facts in a range of layers in Mamba. Modifying W_a cannot achieve high specificity in early layers. And, W_g becomes an unpredictable mediator after layer 30. ROME achives the best performance by modifying W_o projections.

Can we study Attention in an Attention-Free Model?

Path-dependent attention knockout experiments have been successful in understanding factual information flow in Transformer LMs (Geva et al 2023). Those experiments block out the information that flows from the q^th token to the k^th token via a attention head and monitor the effect on some task. That can be achived in transformer LMs by directly modifying the attention matrix calculated by the head.

However, in Mamba, it is difficult to achive this due to certain architectural choices. See the figure below.

Roll-out of Conv + SSM operation in MambaBlock. The SiLU non-linearity between Conv1D and SSM makes it difficult to calculate how much information c_i retains from a_i.

However, we can block out information flow by mean-ablating the subject, subject-last, or other tokens to all the future tokens in different layers to understand Conv + SSM in which layers relay what information.

The purple lines show that Mamba uses the Conv + SSM operation in early middle layers to relay information about the non-subject tokens. Blocking information from non-subject tokens in these layers can reduce the probability of the correct prediction p(o) by up to 50%.
The green lines show that Mamba moves the subject information in 2 steps

First, if the subject contains multiple tokens, the very early layers collate the information from all the subject tokens to the subject last token position.
Second, Mamba uses the Conv + SSM paths in later layers (43 - 48) to propagate critical information about the subject to later tokens.

Results from blocking information flow about certain tokens to all the future tokens.

Previous Work

meng-2022 Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov. Locating and Editing Factual Associations in GPT 2022.
Notes: Applies causal tracing or activation patching on GPT models to understand critical states that mediate factual information. Introduces ROME to update/insert a single fact in the LM by directly modifying the down projection matrix of a MLP module.

meng-2023 Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau. Mass-Editing Memory In a Transformer 2023.
Notes: Scales up ROME to edit thousands of facts in a autoregressive transformer LM by distributing the edit across a range of critical middle layers.

hernandez-2023 Evan Hernandez*, Arnab Sen Sharma*, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau. Linearity of Relation Decoding in Transformer LMs 2023.
Notes: Shows that for a range of relations how the LM extracts relation specific information (decodes the relation) given a prompt x = (s, r) can approximated by a simple linear model. And, that linear model can be achived by taking a first order Taylor expansion of the LM computation itself.

Interpreting Mamba

hernandez-2023 Ameen Ali, Itamar Zimerman, Lior Wolf The Hidden Attention of Mamba Models 2024.
Notes: Shows that the information selective-SSM brings to the k^th token state from the convolved q^th token state can be calculated. And, this information can be visualized as a heatmap per channel (dim), resembling the attention maps in transformers.

hernandez-2023 Gonçalo Paulo, Thomas Marshall, Nora Belrose Does Transformer Interpretability Transfer to RNNs? 2024.
Notes: Finds that a set of selected interpretability tools designed for transformer LMs can be applied to SOTA RNN architectures such as Mamba, RWKV models "out-of-the-box".

This work is not yet peer-reviewed. The preprint can be cited as follows.

bibliography

Arnab Sen Sharma, David Atkinson, and David Bau. Locating and Editing Factual Associations in Mamba. Preprint, arXiv:2404.03646 (2024).

bibtex

@article{sensharma2024locating,
    title={Locating and Editing Factual Associations in Mamba}, 
    author={Arnab Sen Sharma and David Atkinson and David Bau},
    year={2024},
    eprint={2404.03646},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}