Deep Augmentation: Dropout as Augmentation for Self-Supervised Learning
Authors: Rickard BrĂ¼el Gabrielsson, Tongzhou Wang, Manel Baradad, Justin Solomon
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning, we find that uniformly applying dropout across layers does not consistently improve performance. Instead, dropout proves most beneficial in deeper layers and can be matched by alternative augmentations (e.g., PCA). |
| Researcher Affiliation | Academia | Massachusetts Institute of Technology EMAIL |
| Pseudocode | No | The paper describes methods and algorithms textually but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper references code from other works: "For implementation, we utilized the code provided by (Khosla et al., 2020), available at this link." and "We follow the protocol and code of Zhu et al. (2021) that can be found at https://github.com/Py GCL/Py GCL." However, it does not explicitly provide a link or statement for the source code of the methodology described in this paper (Deep Augmentation). |
| Open Datasets | Yes | We validate our approach on Transformers, Res Nets, and Graph Neural Networks across multiple data modalities... For images, we employ a Res Net (He et al., 2016) and follow the Sim CLR framework (Chen et al., 2020), testing on CIFAR10, CIFAR100, and a 100-class subset of Image Net (Deng et al., 2009). ...pre-training a BERT transformer (Devlin et al., 2019) on 106 randomly sampled sentences from English Wikipedia. Hyperparameters are tuned on the STS-B development set (Cer et al., 2017), and final evaluations are conducted on seven standard semantic textual similarity (STS) tasks (Agirre et al., 2012; Cer et al., 2017; Marelli et al., 2014). ...We evaluate on COLLAB and IMBD-Multi (Yanardag & Vishwanathan, 2015), as well as NCI1 (Wale & Karypis, 2006) and PROTEINS (Borgwardt et al., 2005). |
| Dataset Splits | Yes | Hyperparameters are tuned on the STS-B development set (Cer et al., 2017), and final evaluations are conducted on seven standard semantic textual similarity (STS) tasks (Agirre et al., 2012; Cer et al., 2017; Marelli et al., 2014). ...Hyperparameters are tuned on a validation split, with results reported on a separate test set. ...For our supervised learning experiments, training was conducted for 100 epochs but otherwise using the same hyperparameters as those in the fine-tuning phase post pre-training, which lasted 28 epochs. |
| Hardware Specification | No | Most of these savings are realized on the GPU. The paper mentions the use of GPUs but does not provide specific details such as model numbers or types of GPUs used. |
| Software Dependencies | No | The paper mentions using Python and specific frameworks/libraries like BERT, Sim CLR, GCL, GNN, Res Net, Transformers, etc., but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Our experiments were conducted with a batch size of 1024, training each method for 1500 epochs. ...Our fixed 50% dropout rate at a chosen layer still yields higher results... We pre-train for 1000 epochs and use the following data augmentations in GCL: A.Random Choice([ A.RWSampling(num_seeds=1000, walk_length=10), A.Node Dropping(pn=0.1), A.Feature Masking(pf=0.1), A.Edge Removing(pe=0.1)], 1). For our supervised learning experiments, training was conducted for 100 epochs but otherwise using the same hyperparameters as those in the fine-tuning phase post pre-training, which lasted 28 epochs. |