reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Exact Bit-level Reversible Transformers Without Changing Architecture

Authors: Guoqiang Zhang, Jp Lewis, W. Bastiaan Kleijn

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments in natural language generation, image classification, and language translation show that BDIA-transformers outperform their conventional counterparts significantly in terms of validation performance while also requiring considerably less training memory. Experimental results for natural language generation (NLG), image classification, and language translation show that the BDIA technique significantly improves the validation performance over that of the corresponding baseline transformers and simultaneously reduces training memory significantly.
Researcher Affiliation	Collaboration	1Department of Computer Science, University of Exeter, UK 2NVDIA, USA 3 School of Engineering and Computer Science, Victoria University of Wellington, New Zealand.
Pseudocode	No	The paper describes methods using mathematical equations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source-code can be found via this link. Four open-source repositories were used in the experiments (see Table 7 in Appendix G).
Open Datasets	Yes	We consider fully fine-tuning GPT2 medium ... by using the E2E dataset (Novikova et al., 2017). ...we trained BDIA-Vi T ... on CIFAR10 and CIFAR100... The dataset being used is from Kaggle (Kelly, 2020). ...train BDIA-GPT2 on the openwebtext dataset.
Dataset Splits	No	The paper mentions using specific datasets like E2E, CIFAR10, CIFAR100, and openwebtext, and notes the use of a "0.05% subset" from openwebtext. However, it does not explicitly provide the train/validation/test split percentages or sample counts for these datasets in the main text, nor does it cite specific predefined splits for reproduction.
Hardware Specification	Yes	In this experiment, we trained BDIA-Vi T with K=6 transformer blocks on CIFAR10 and CIFAR100 by using a single 2080 Ti GPU.
Software Dependencies	No	The paper refers to using open-source repositories (Table 7) for implementing experiments, but it does not explicitly list key software components with their specific version numbers (e.g., Python, PyTorch, CUDA versions) within the text.
Experiment Setup	Yes	For comparison, we also fine-tune GPT2 directly and via the Lo RA technique with the default setup of (rank, α) = (4, 32). ...we utilized the SET-Adam optimizer (Zhang, 2024) in the training process with the configuration (η0, β1, β2, ϵ) = (1e 4, 0.9, 0.999, 1e 18), where η0 denotes the initial learning rate. The dropout rate was set to 0.1 to reduce over-fitting. ...The peak memory includes both the model parameters and the training states for a batchsize of 128. ...The tested BDIA-transformer has six transformer blocks in both the encoder and decoder, respectively.