reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Equivariant Neural Functional Networks for Transformers

Authors: Viet-Hoang Tran, Thieu Vo, An Nguyen, Tho-Huu Tran, Minh-Khoi Nguyen-Nhat, Thanh Tran, Duy-Tung Pham, Tan Nguyen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additionally, we release a dataset of over 125,000 Transformers model checkpoints trained on two datasets with two tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance. We empirically demonstrate that Transformer-NFN consistently outperforms other baseline models on our constructed datasets. Through comprehensive ablation studies, we emphasize Transformer NFN s ability to effectively capture information within the transformer block, establishing it as a robust predictor of model generalization.
Researcher Affiliation	Collaboration	1National University of Singapore 2FPT Software AI Center, Vietnam 3Vin University, Vietnam EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	Yes	I Implementation of Equivariant and Invariant Layers I.1 Summary of Equivariant and Invariant Layers I.1.1 Equivariant Layers with bullet notation I.1.2 Invariant Layers with bullet notation I.2 Equivariant Layers Pseudocode I.2.1 [E(W)](Q:i) j,k Pseudocode I.2.2 [E(W)](K:i) j,k Pseudocode I.2.3 [E(W)](V :i) j,k Pseudocode I.2.4 [E(W)](O:i) j,k Pseudocode I.2.5 [E(W)](A) j,k Pseudocode I.2.6 [E(b)](A) k Pseudocode I.2.7 [E(W)](B) j,k Pseudocode I.2.8 [E(b)](B) k Pseudocode I.3 Invariant Layers Pseudocode
Open Source Code	Yes	The code is publicly available at https://github.com/Mathematical AI-NUS/Transformer-NFN. Reproducibility Statement. Source codes for our experiments are provided in the supplementary materials of the paper.
Open Datasets	Yes	Additionally, we release a dataset of over 125,000 Transformers model checkpoints trained on two datasets with two tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance. 4. We release Small Transformer Zoo dataset, which consists of more than 125,000 Transformers model checkpoints trained on two different tasks: digit image classification on MNIST and text topic classificaction on AGNews. To our knowledge, this the first dataset of its kind. Reproducibility Statement. ... All datasets used in this paper are publicly available through an anonymous link provided in the README file of the supplementary material.
Dataset Splits	No	The paper uses the Small Transformer Zoo dataset, which consists of model checkpoints. For the experiments, it states: "we evaluate each model s prediction performance not only on the entire dataset but also on four smaller subsets, each filtered by accuracy thresholds of 20%, 40%, 60%, and 80%." While this describes evaluation subsets, it does not explicitly provide the training/test/validation splits used for the Transformer-NFN itself, nor the splits for the underlying MNIST and AGNews datasets used to train the transformer models in the zoo.
Hardware Specification	No	The paper does not provide specific hardware details (like GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions computational aspects in a general sense, for example, for efficiency: "enabling efficient and highly parallelizable computations on modern GPUs.".
Software Dependencies	No	The paper mentions several software components and libraries, such as "Adam optimizer", "XGBoost (Chen & Guestrin, 2016)", "Light GBM (Ke et al., 2017)", "Random Forest (Breiman, 2001)", and concepts like "einsum". However, it does not provide specific version numbers for any of these components, which are necessary for reproducible software dependency information.
Experiment Setup	Yes	To create a wide range of transformer model, we opt to vary six hyperparameters in our experiments: train fraction, optimizer (SGD, SGDm, Adam, or RMSprop), learning rate, L2 regularization coefficient, weight initialization standard deviation, and dropout probability. ... Table 4 provides a detailed overview of our hyperparameter configurations. Overall, there are 8000 configurations for each category, resulting in 16000 configurations in total. These configurations are consistently applied across both tasks to ensure comparability. All models are trained for 100 epochs, with checkpoints and accuracy measurements recorded at epochs 50, 75, 100, and at the epoch with the best accuracy. Training details The models were trained for a total of 50 epochs, using a batch size of 16. We employed the Adam optimizer with a maximum learning rate of 10-3. A linear warmup strategy was applied to the learning rate, spanning the initial 10 epochs for gradual warmup. We utilize Binary Cross Entropy for the loss function. In our experimental setup, the embedding component is modeled using a single-layer MLP with 10 hidden neurons, while the classifier component is a two-layer MLP, each layer containing 10 hidden neurons.