On the Dynamics Under the Unhinged Loss and Beyond
Authors: Xiong Zhou, Xianming Liu, Hanzhang Wang, Deming Zhai, Junjun Jiang, Xiangyang Ji
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically demonstrate these theoretical results and insights through extensive experiments. |
| Researcher Affiliation | Academia | Xiong Zhou EMAIL Xianming Liu EMAIL Hanzhang Wang EMAIL Deming Zhai EMAIL Junjun Jiang EMAIL School of Computer Science and Technology Harbin Institute of Technology Harbin, 150001, China Xiangyang Ji EMAIL Department of Automation Tsinghua University Beijing, 100084, China |
| Pseudocode | No | The paper describes dynamics using mathematical equations and theorems but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or provide a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on widely-used classification datasets including CIFAR-10, CIFAR-100, and Image Net-100. ...For classification experiments in Figure 2, Figure 12, Figure 13, Figure 14, we experiment with Res Net-18, Res Net-34, and Res Net-50 (He et al., 2016) trained on CIFAR-10, CIFAR100 (Krizhevsky and Hinton, 2009), and Image Net-100 that takes the first 100 classes of Image Net (Deng et al., 2009) , respectively. |
| Dataset Splits | Yes | We only use the imbalanced versions of CIFAR-10 and CIFAR-100 by following the setting in (Zhou et al., 2022d). The number of training examples is reduced for per class, and the test set keeps unchanged, where we use the imbalance ratio ρ = maxi ni / mini ni denote the ratio between sample sizes of the most frequent and least frequent class. Moreover, long-tailed imbalance (Cui et al., 2019) that utilizes an exponential decay in samples sizes and step imbalance (Buda et al., 2018)(that sets all minority classes to have the same number of samples, as do all majority classes) are considered. |
| Hardware Specification | No | The paper mentions using ResNet architectures but does not specify any hardware components (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | For all training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing Loshchilov and Hutter (2017) with Tmax being the corresponding epochs. The paper mentions PyTorch once in a footnote. However, no specific version numbers are provided for any software, libraries, or frameworks used. |
| Experiment Setup | Yes | The networks are trained for 200 epochs and 100 epochs for CIFAR-10/-100 and Image Net-100, respectively. For all training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing Loshchilov and Hutter (2017) with Tmax being the corresponding epochs. The initial learning rate is set to 0.1, weight decay is set to 5e-4, and batch size is set to 256. |