Attentional-Biased Stochastic Gradient Descent

Authors: Qi Qi, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experimental Results on Data Imbalance Problem. We conduct experiments on multiple imbalanced benchmark datasets... 5 Experimental Results on Label Noise Problem. We provide empirical studies on the noisy label datasets in this section.
Researcher Affiliation Collaboration Qi Qi1, Yi Xu2, Wotao Yin2, Rong Jin3, Tianbao Yang4*... 1Department of Computer Science, The University of Iowa 2Alibaba Group 3Meta Inc 4Department of Computer Science and Engineering, Texas A&M University
Pseudocode Yes Algorithm 1 ABSGD (λ, η, γ, β, s0, w0, T)
Open Source Code No The paper does not contain any explicit statement about releasing its own source code or a link to a code repository. The text refers to 'We have implemented the newly proposed structure, named as Convmixer (Trockman & Kolter, 2022)' implying external implementation was used or adapted, not released.
Open Datasets Yes We conduct experiments on multiple imbalanced benchmark datasets, including CIFAR-10 (LT), CIFAR-10 (ST), CIFAR-100 (LT), CIFAR-100 (ST), Imaget Net-LT (Liu et al., 2019), Places-LT (Zhou et al., 2017), and i Naturalist2018 (i Natrualist, 2018), and Clothing1M (Xiao et al., 2015) datasets.
Dataset Splits Yes The original CIFAR-10 and CIFAR-100 data contain 50,000 training images and 10,000 validation with 10 and 100 classes, respectively. We construct the imbalanced version of training set of CIFAR10, CIFAR100 following the two strategies: Long-Tailed (LT) imbalance (Cao et al., 2019) and Step (ST) imbalance (Buda et al., 2018) with two different imbalance ratio ρ = 10, ρ = 100, and keep the testing set unchanged.
Hardware Specification Yes To show the efficiency of ABSGD, we conduct an experiment on CIFAR-10 data with different networks on NVIDIA Ge Force GTX 1080 Ti.
Software Dependencies No The paper mentions models like Res Nets and Convmixer, but does not provide specific software dependencies (libraries, frameworks) along with their version numbers.
Experiment Setup Yes For fair comparison, ABSGD is implemented with the same hyperparameters such as momentum parameter, initial step size, weight decay and step size decaying strategy, as the baseline momentum SGD method. For ABSGD, the moving average parameter γ are tuned in [0.1 : 0.1 : 1] by default. Following the experimental setting in the literature, the initial learning rate is 0.1 and decays by a factor of 100 at the 160-th, 180-th epoch for both ABSGD and SGD in our experiments, respectively. The value of λ in ABSGD tuned in [1 : 1 : 10].