reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robustness through Data Augmentation Loss Consistency

Authors: Tianjian Huang, Shaunak Ashish Halbe, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that DAIR consistently outperforms ERM and DA-ERM with little marginal computational cost and sets new state-of-the-art results in several benchmarks involving covariant data augmentation. We apply DAIR to real-world learning problems involving covariant data augmentation: robust neural task-oriented dialog state tracking and robust visual question answering. We also apply DAIR to tasks involving invariant data augmentation: robust regression, robust classiﬁcation against adversarial attacks, and robust Image Net classiﬁcation under distribution shift.
Researcher Affiliation	Collaboration	Tianjian Huang EMAIL University of Southern California Shaunak Halbe EMAIL Georgia Institute of Technology Chinnadhurai Sankar EMAIL Meta AI Pooyan Amini EMAIL Meta AI Satwik Kottur EMAIL Meta AI Alborz Geramifard EMAIL Meta AI Meisam Razaviyayn EMAIL University of Southern California Ahmad Beirami EMAIL Google Research
Pseudocode	Yes	Algorithm 1 Training Neural Networks with GD 1: Input: Number of steps T, Training set S, Learning Rate η, Initialized Parameter θ0 2: for t = 1, 2, . . . , T do 3: Compute θ b E f DAIR,R,λ(zi, ezi; θt). 4: Set θt+1 = θt η θ b E f DAIR,R,λ(zi, ezi; θt). 5: end for
Open Source Code	Yes	Our code of all experiments are available at: https://github.com/optimization-for-data-driven-science/DAIR.
Open Datasets	Yes	We apply DAIR to real-world learning problems involving covariant data augmentation: robust neural task-oriented dialog state tracking and robust visual question answering. We also apply DAIR to tasks involving invariant data augmentation: robust regression, robust classiﬁcation against adversarial attacks, and robust Image Net classiﬁcation under distribution shift. ... Among task-oriented dialog datasets, Multi WOZ (Budzianowski et al., 2018) has gained the most popularity ... In this paper, we focus on the Invariant and Covariant VQA (IV/CV-VQA) dataset which contains semantically edited images corresponding to a subset from VQA v2 (Goyal et al., 2017). ... Colored MNIST (Arjovsky et al., 2019) is a binary classiﬁcation task built on the MNSIT dataset. ... Rotated MNIST (Ghifary et al., 2015) is a dataset where MNIST digits are rotated. ... We conduct our experiments on CIFAR-10 dataset ... In this experiment, we consider a regression task to minimize the root mean square error (RMSE) of the predicted values on samples from the Drug Discovery dataset. ... Image Net-9 Background Challenge (Xiao et al., 2020) was proposed to test the background robustness of image classiﬁcation models.
Dataset Splits	Yes	For the VQA v2 dataset, we use the original VQA v2 train split for training, along with the IV-VQA and CV-VQA train splits for augmentation in the DAIR and DA-ERM(Agarwal et al., 2020) settings. ... We train a model consisted of three convolutional layers and two fully connected layers with 20,000 examples. For each dataset we are deﬁning several diﬀerent schemes on how the dataset could be modiﬁed: Table 10 (Colored MNIST) and Table 11 (Rotated MNIST). ... We evaluate the performance of each algorithm against PGD attacks as well as the clean (no attack-free) accuracy. In our approach, the augmented examples ez can be generated by a certain strong attack, such as Projected Gradient Descent (PGD) or CW (Carlini & Wagner, 2017).We conduct our experiments on CIFAR-10 dataset and compare our approach with several other state-of-the-art baselines.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software like Parl AI (Miller et al., 2017) and BART (Lewis et al., 2019) and Torchvision, but does not provide specific version numbers for these components, which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	For training we follow a two stage schedule with a learning rate of 0.005 for the ﬁrst 20 epochs and a learning rate of 0.0005 for the next 20. We choose a batch size of 64 for all experiments. (Table 9: Training parameter of MNIST experiments). All the methods are trained for 40 epochs with a learning rate of 0.001 and a batch size of 48. (Section I: Setup and additional results for Visual Question Answering). For training the DAIR model, ... We train the model for 120 epochs with initial step size 0.0001 and uses Cosine Annealing scheduler. (Section K.1: Setups for the main results in Section 4.4). We train the model for 175 epochs with batchsize 128, initial learning rate of 0.1 and decay of 0.1 at 30, 70, 110, 150 epochs. (Section L: Details on training Image Net-9)