MemLLM: Finetuning LLMs to Use Explicit Read-Write Memory
Authors: Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, Mohsen Fayyaz, Hinrich Schuetze
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate that Mem LLM enhances the LLM s performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular. Our evaluation on Re-Doc RED (Tan et al., 2022) demonstrates that Mem LLM achieves better perplexity compared to baselines without memory components, with strong gains on named entities. We also show that Mem LLM outperforms non-memory-based methods on knowledge editing. |
| Researcher Affiliation | Collaboration | 1Center for Information and Language Processing, LMU Munich, Germany 2Munich Center for Machine Learning, Germany 3Microsoft, Berlin, Germany |
| Pseudocode | Yes | Algorithm 1 presents the pseudocode for the process of generating Mem LLM s memory-read training data. See Section 3.3 for a detailed description of the same process. |
| Open Source Code | Yes | The project repository is publicly available at: https: //github.com/amodaresi/Mem LLM |
| Open Datasets | Yes | We use three such datasets. (i) Re-Doc RED (Tan et al., 2022): Wikipedia texts annotated (in a Wikidata format)... (ii) Doc RED s distant supervised training set... (iii) A set of counterfactual variations of Re-Doc RED (Modarressi et al., 2024)... Our primary source is a full dump of English Wikipedia2... available at: https://huggingface.co/datasets/wikimedia/wikipedia |
| Dataset Splits | Yes | We select 1000 examples from the humanannotated split of Doc RED as positive examples where the focus sentence is annotated as evidence. For negative examples, we choose 1000 examples where the focus sentence contains at least one entity but there is no evidence for the relation in the focus sentence. |
| Hardware Specification | No | The paper mentions finetuning with a Mistral-7B-v0.1 model but does not specify any hardware details like GPU/CPU models, memory, or cloud instances used for running the experiments. |
| Software Dependencies | No | The paper mentions using a Mistral-7B-v0.1 model, Adam optimizer, and LoRA, but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or the Hugging Face Transformers library. |
| Experiment Setup | Yes | We finetune Mem LLM, with a Mistral-7B-v0.1 model (Jiang et al., 2023) using an Adam optimizer (Kingma & Ba, 2015), with the learning rate set to 2 10 5, 2 epochs, and a batch size of 96. For Lo RA specific parameters, we apply a dropout rate of 0.1, with a rank of 16 and an alpha weight of 8. We opted to set Qthr to 30... We set τe and τt to 0.7 and τr to 0.85. We set these values to τe = 0.85, τt = 0.2 and τr = 0.6 respectively for model editing experiments. |