Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment

Authors: Naoya Hasegawa, Issei Sato

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Additionally, through experiments on long-tailed datasets, we illustrate the practical usefulness of MLA under more realistic conditions. We experimentally validate that the approximation of MLA holds under realistic, non-ideal conditions where NC is not fully realized (Section 5). We provide empirical guidelines for hyperparameter-tuning of MLA (Section 5.4).
Researcher Affiliation Academia Naoya Hasegawa & Issei Sato The University of Tokyo EMAIL
Pseudocode Yes Algorithm 1 outlines the detailed steps of 1vs1adjuster, a 1-vs-1 multi-class classifier that performs classification based on the decision boundaries proposed in Proposition 3.
Open Source Code Yes Code is available at https://github.com/HN410/MLA-Approximates-NCDBA.
Open Datasets Yes We used CIFAR10, CIFAR100 (Krizhevsky, 2009), and Image Net (Deng et al., 2009) as datasets. [...] We also used the tabular dataset Helena (Guyon et al., 2019).
Dataset Splits Yes For tuning the hyperparameters γ1v1, γ+, γ , we used validation datasets. Since CIFAR10 and CIFAR100 do not have validation datasets, we created validation datasets using a portion of the training datasets. Following Liu et al. (2019), we constructed the validation datasets by extracting only 20 samples per class from the training datasets and using the remaining samples as the training datasets. [...] For Helena, we randomly sampled 20 non-overlapping samples per class for validation and test sets respectively.
Hardware Specification Yes All experiments were conducted on a single NVIDIA A100.
Software Dependencies No The paper does not specify version numbers for any software dependencies (e.g., programming language version, specific library versions).
Experiment Setup Yes We chose stochastic gradient descent with momentum = 0.9 as the optimizer and applied a cosine learning rate scheduler (Loshchilov & Hutter, 2017) to gradually decrease the learning rate from 0.01 to 0. The batch size was set to 64, and the number of training epochs was 320. The loss function used was cross-entropy loss, and regularization included a weight decay of 0.005 (Hanson & Pratt, 1989) and feature regularization of 0.01 (Hasegawa & Sato, 2023). [...] The learning rate was gradually decreased from 0.05 to 0, with the number of training epochs set to 200. Regularization involved a weight decay of 0.00024 and a feature regularization of 0.00003. [...] The model was trained for 400 epochs using Adam W (Loshchilov & Hutter, 2018). Regularization methods included a 0.15 dropout rate (Srivastava et al., 2014), 0.15 weight decay, and 0.001 feature regularization.