On the Dynamics Under the Unhinged Loss and Beyond

Authors: Xiong Zhou, Xianming Liu, Hanzhang Wang, Deming Zhai, Junjun Jiang, Xiangyang Ji

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically demonstrate these theoretical results and insights through extensive experiments.
Researcher Affiliation Academia Xiong Zhou EMAIL Xianming Liu EMAIL Hanzhang Wang EMAIL Deming Zhai EMAIL Junjun Jiang EMAIL School of Computer Science and Technology Harbin Institute of Technology Harbin, 150001, China Xiangyang Ji EMAIL Department of Automation Tsinghua University Beijing, 100084, China
Pseudocode No The paper describes dynamics using mathematical equations and theorems but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code or provide a link to a code repository.
Open Datasets Yes We conduct experiments on widely-used classification datasets including CIFAR-10, CIFAR-100, and Image Net-100. ...For classification experiments in Figure 2, Figure 12, Figure 13, Figure 14, we experiment with Res Net-18, Res Net-34, and Res Net-50 (He et al., 2016) trained on CIFAR-10, CIFAR100 (Krizhevsky and Hinton, 2009), and Image Net-100 that takes the first 100 classes of Image Net (Deng et al., 2009) , respectively.
Dataset Splits Yes We only use the imbalanced versions of CIFAR-10 and CIFAR-100 by following the setting in (Zhou et al., 2022d). The number of training examples is reduced for per class, and the test set keeps unchanged, where we use the imbalance ratio ρ = maxi ni / mini ni denote the ratio between sample sizes of the most frequent and least frequent class. Moreover, long-tailed imbalance (Cui et al., 2019) that utilizes an exponential decay in samples sizes and step imbalance (Buda et al., 2018)(that sets all minority classes to have the same number of samples, as do all majority classes) are considered.
Hardware Specification No The paper mentions using ResNet architectures but does not specify any hardware components (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies No For all training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing Loshchilov and Hutter (2017) with Tmax being the corresponding epochs. The paper mentions PyTorch once in a footnote. However, no specific version numbers are provided for any software, libraries, or frameworks used.
Experiment Setup Yes The networks are trained for 200 epochs and 100 epochs for CIFAR-10/-100 and Image Net-100, respectively. For all training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing Loshchilov and Hutter (2017) with Tmax being the corresponding epochs. The initial learning rate is set to 0.1, weight decay is set to 5e-4, and batch size is set to 256.