reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Banyan: Improved Representation Learning with Explicit Structure

Authors: Mattia Opper, Siddharth N

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present Banyan, a model that efficiently learns semantic representations by leveraging explicit hierarchical structure. ... It excels in low-resource settings, offering a viable alternative for under-represented languages and highlighting its potential for efficient, interpretable NLP in resource-constrained environments. ... 5. Experiments: English Evaluation ... 6. Experiments: Multilingual Evaluation ... 7. Improvements and Ablations
Researcher Affiliation	Academia	1School of Informatics, University of Edinburgh, UK. Correspondence to: Mattia Opper <EMAIL>, N. Siddharth <EMAIL>.
Pseudocode	Yes	Algorithm 1 BANYAN: Entangled Compose Input: Global frontier (sn, en) N n=1, compose ( ), concat ( ), similarity CSIM(e, e ) 1: A (sn, en) N n=1 initialise frontier 2: (V, E) ( , ) initialise graph 3: while i : si si+1 s V do 4: i arg maxi CSIM(ei, ei+1) locate closest pair 5: ep = (ei , ei +1) compose 6: V V {(si si +1, ep)} 7: E E {p i , p (i + 1)} 8: J {j : (sj, sj+1) = (si , si +1)} locate all occurrences of this pair 9: A A \ { j J Aj, Aj+1} delete occurrences from those locations 10: A A J {(si si +1, ep)} insert composition into those locations return: Graph (V, E)
Open Source Code	Yes	1Code available at: github.com/exlab-research/Banyan
Open Datasets	Yes	On the word level, we use Simlex-999 (Hill et al., 2015) and Word Sim-S/R (Agirre et al., 2009). On the sentence level, we use STS-12 through 16 (Agirre et al., 2012; 2013; 2014; 2015; 2016), the STS-B (Cer et al., 2017), SICK-R (Marelli et al., 2014) and Sem Rel (Ousidhoum et al., 2024) ... We use two retrieval datasets from the BEIR suite (Thakur et al., 2021). ... We also include two test sets from the GLUE benchmark (Wang et al., 2019). ... For SELF-STRAE, BANYAN and Sent2Vec we pre-train on a uniform subsample of English Wikipedia ... Meanwhile for GLOVE and ROBERTA we pre-train on Wiki-103 (Merity et al., 2016) ... For Afrikaans, Spanish and Amharic we obtained corpora from Leipzig Corpora Collection (Goldhahn et al., 2012). For Amharic we utilised a MIT licenced pre-training set of 1 million sequences available at this link. Hausa data was sourced from Opus (Nygaard & Tiedemann, 2003).
Dataset Splits	No	The paper utilizes several benchmark datasets like Simlex-999, Wordsim-S/R, STS-B, SICK-R, Sem Rel, Quora, Arguana, SST-2, and MRPC, which typically have predefined test sets or splits. However, the paper itself does not explicitly detail the training, validation, or test splits (e.g., percentages, sample counts) that were used for its experiments. It refers to 'test sets' for evaluation but does not specify the splitting methodology or exact numbers.
Hardware Specification	Yes	On a single Nvidia A40 GPU with a batch size of 1024, Banyan trains from scratch in under 50 minutes... XLM-R runs at batch size 128 across 4x A40 cards.
Software Dependencies	No	We trained SELF-STRAE and BANYAN for 15 epochs (circa 15k steps and sufficient for convergence) using the Adam optimiser (Kingma & Ba, 2015)... To process the graphs we used DGL (Wang et al., 2020)... We used the Transformers library to implement and train the model (Wolf et al., 2020)... We utilise a pretrained BPE tokeniser for each language from the BPEMB Python package (Heinzerling & Strube, 2018). While these software components are mentioned, specific version numbers for DGL, Transformers library, BPEMB, or other key libraries are not provided.
Experiment Setup	Yes	For all models we set the embedding size to 256. For SELF-STRAE we use the configuration of (Opper et al., 2023b) and set embeddings as square matrices (i.e., K=16 and U=16). For BANYAN we set these values to K=128 and U=2... We trained SELF-STRAE and BANYAN for 15 epochs (circa 15k steps and sufficient for convergence) using the Adam optimiser (Kingma & Ba, 2015), with a learning rate of 1e-3 for BANYAN and 1e-4 for SELF-STRAE using a batch size of 512... ROBERTA medium was trained for 200,000 steps, (10% of which were used for warmup). We used a learning rate of 5e-5, and a linear schedule... For XLM-R we finetune for 100k steps with early stopping, using a linearly scheduled learning rate of 5e-5 with 10% of steps as warmup. XLM-R runs at batch size 128...