AdaFlood: Adaptive Flood Regularization

Authors: Wonho Bae, Yi Ren, Mohamed Osama Ahmed, Frederick Tung, Danica J. Sutherland, Gabriel L. Oliveira

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments (Section 4) demonstrate that Ada Flood generally outperforms previous flood methods on a variety of tasks, including image and text classification, probability density estimation for asynchronous event sequences, and regression for tabular datasets.
Researcher Affiliation Collaboration Wonho Bae EMAIL University of British Columbia Yi Ren EMAIL University of British Columbia Mohamed Osama Ahmed EMAIL Borealis AI Frederick Tung EMAIL Borealis AI Danica J. Sutherland EMAIL University of British Columbia & Amii Gabriel L. Oliveira EMAIL Borealis AI
Pseudocode Yes Algorithm 1 Training of Auxiliary Network(s) and Ada Flood 1: Train a single auxiliary network f aux on the entire training set D Fine-tuning method only 2: for Daux,i in {Daux,i}n i=1 do 3: Train f aux,i, either from scratch or by fine-tuning f aux, on D \ Daux,i 4: Save the adaptive flood level θi for each xi Daux,i using f aux,i on x Daux,i 5: end for 6: Train the main model f using Equation (3) and adaptive flood levels θ computed above
Open Source Code No Reproducibility For each experiment, we listed implementation details such as model, regularization, and search space for hyperparameters. We also specified datasets we used for each experiment, and how they were split and augmented, along with the description of metrics. The code is released with the final version.
Open Datasets Yes We use two popular benchmark datasets, Stack Overflow (predicting the times at which users receive badges) and Reddit (predicting posting times). Following Bae et al. (2023), we also benchmark our method on a dataset with stronger periodic patterns: Uber (predicting pick-up times)... We use SVHN (Netzer et al., 2011), CIFAR-10, and 100 (Krizhevsky et al., 2009) for image classification... We also use the tabular datasets Brazilian Houses and Wine Quality from Open ML (Vanschoren et al., 2013)... We further employ Stanford Sentiment Treebank (SST-2)... We use Image Net100 (Tian et al., 2020) for image classification... We use NYC Taxi Tip dataset from Open ML (Vanschoren et al., 2013)...
Dataset Splits Yes We split each training dataset into train (80%) and validation (20%) sets. Details are provided in Appendix A. ...we split each training dataset into train (80%) and validation (20%) sets for hyperparameter search; thus our numbers are generally somewhat worse than what they reported, as we do not directly tune on the test set. ...We typically use five-fold cross-validation as a reasonable trade-off between computational expense and good-enough models to estimate θi
Hardware Specification No No specific hardware details (like GPU/CPU models or processor types) are provided for running the experiments. The paper discusses computational costs but not the hardware used.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions) are explicitly mentioned in the paper.
Experiment Setup Yes For each dataset, we conduct hyper-parameter tuning for learning rate and the weight for L2 regularization with the unregularized baseline (we still apply early stopping and L2 regularization by default). Once learning rate and weight decay parameters are fixed, we search for the optimal flood levels. The optimal flood levels are selected via a grid search on { 50, 45, 40 . . . , 0, 5} { 4, 3 . . . , 3, 4} for Flood and i Flood, and optimal γ on {0.0, 0.1 . . . , 0.9} for Ada Flood using the validation set.