reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Augmentation: Dropout as Augmentation for Self-Supervised Learning

Authors: Rickard Brüel Gabrielsson, Tongzhou Wang, Manel Baradad, Justin Solomon

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning, we find that uniformly applying dropout across layers does not consistently improve performance. Instead, dropout proves most beneficial in deeper layers and can be matched by alternative augmentations (e.g., PCA).
Researcher Affiliation	Academia	Massachusetts Institute of Technology EMAIL
Pseudocode	No	The paper describes methods and algorithms textually but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper references code from other works: "For implementation, we utilized the code provided by (Khosla et al., 2020), available at this link." and "We follow the protocol and code of Zhu et al. (2021) that can be found at https://github.com/Py GCL/Py GCL." However, it does not explicitly provide a link or statement for the source code of the methodology described in this paper (Deep Augmentation).
Open Datasets	Yes	We validate our approach on Transformers, Res Nets, and Graph Neural Networks across multiple data modalities... For images, we employ a Res Net (He et al., 2016) and follow the Sim CLR framework (Chen et al., 2020), testing on CIFAR10, CIFAR100, and a 100-class subset of Image Net (Deng et al., 2009). ...pre-training a BERT transformer (Devlin et al., 2019) on 106 randomly sampled sentences from English Wikipedia. Hyperparameters are tuned on the STS-B development set (Cer et al., 2017), and final evaluations are conducted on seven standard semantic textual similarity (STS) tasks (Agirre et al., 2012; Cer et al., 2017; Marelli et al., 2014). ...We evaluate on COLLAB and IMBD-Multi (Yanardag & Vishwanathan, 2015), as well as NCI1 (Wale & Karypis, 2006) and PROTEINS (Borgwardt et al., 2005).
Dataset Splits	Yes	Hyperparameters are tuned on the STS-B development set (Cer et al., 2017), and final evaluations are conducted on seven standard semantic textual similarity (STS) tasks (Agirre et al., 2012; Cer et al., 2017; Marelli et al., 2014). ...Hyperparameters are tuned on a validation split, with results reported on a separate test set. ...For our supervised learning experiments, training was conducted for 100 epochs but otherwise using the same hyperparameters as those in the fine-tuning phase post pre-training, which lasted 28 epochs.
Hardware Specification	No	Most of these savings are realized on the GPU. The paper mentions the use of GPUs but does not provide specific details such as model numbers or types of GPUs used.
Software Dependencies	No	The paper mentions using Python and specific frameworks/libraries like BERT, Sim CLR, GCL, GNN, Res Net, Transformers, etc., but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Our experiments were conducted with a batch size of 1024, training each method for 1500 epochs. ...Our fixed 50% dropout rate at a chosen layer still yields higher results... We pre-train for 1000 epochs and use the following data augmentations in GCL: A.Random Choice([ A.RWSampling(num_seeds=1000, walk_length=10), A.Node Dropping(pn=0.1), A.Feature Masking(pf=0.1), A.Edge Removing(pe=0.1)], 1). For our supervised learning experiments, training was conducted for 100 epochs but otherwise using the same hyperparameters as those in the fine-tuning phase post pre-training, which lasted 28 epochs.