reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Authors: Masakazu Yoshimura, Teruaki Hayashi, Yota Maeda

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments indicate that PEFT performs more effectively for Mamba than Transformers. Lastly, we demonstrate how to effectively combine multiple PEFT methods and provide a framework that outperforms previous works. ... In the experiments, we benchmarked Mamba using PEFT methods, including seven main methods and a total of 20 derived variations (see Figure 1). ... We conduct our evaluation on the VTAB-1k image classification dataset (Zhai et al., 2019). ... In addition to the image tasks, we evaluate our method on language tasks using the vanilla Mamba Gu & Dao (2023).
Researcher Affiliation	Industry	Masakazu Yoshimura , Teruaki Hayashi & Yota Maeda Sony Group Corporation Japan EMAIL
Pseudocode	Yes	The detailed algorithm is provided in Appendix B. ... Algorithm 1 Hybrid PEFT Search Algorithm
Open Source Code	Yes	The source code is available at: https://github.com/sony/mambapeft.
Open Datasets	Yes	We conduct our evaluation on the VTAB-1k image classification dataset (Zhai et al., 2019). ... We adopt pre-trained weights trained with Image Net-1k (Deng et al., 2009) using De IT (Touvron et al., 2021) training framework in all models. ... We experiment with a commonsense reasoning task, following the setup and dataset of Hu et al. (2023).
Dataset Splits	Yes	For each task, 1000 images are used for training. ... This experiment uses 170k datasets, in contrast to the 1k used for VTAB-1k. ... We used official test data of VTAB-1K as training data and vice versa. ... Each model is fine-tuned with about 140,000 data for three epochs with a batch size of 16.
Hardware Specification	Yes	By processing five tasks in parallel on one A100 GPU, one trial can be completed in around 20 minutes, with minimal dependency on the type and size of the applied PEFT methods.
Software Dependencies	No	The paper mentions using "Adam W optimizer" and "Optuna" for hyperparameter optimization but does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We follow the setup of Jie & Deng (2023) in our experiments, using Adam W optimizer (Loshchilov & Hutter, 2017) and training the model for 100 epochs. The learning rate is set to 1e-3, with a cosine scheduler and a warmup period of 10 epochs. A weight decay with 1e-4 magnitude is applied. We do not perform data augmentation. ... Each model is fine-tuned with about 140,000 data for three epochs with a batch size of 16. A linear learning rate scheduler is used with a warmup period of 100 iterations. ... The learning rate configurations for language tasks are shown in Table 8.