An Analytical Model for Overparameterized Learning Under Class Imbalance
Authors: Eliav Mor, Yair Carmon
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our theoretical findings on simulated data and imbalanced CIFAR10, MNIST and Fashion MNIST datasets. |
| Researcher Affiliation | Academia | Eliav Mor EMAIL Department of Computer Science Tel Aviv Univeristy Yair Carmon EMAIL Department of Computer Science Tel Aviv Univeristy |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. It primarily uses mathematical notation and textual descriptions for its methods. |
| Open Source Code | No | The paper mentions using third-party tools like Py Torch, CVXPY, and MOSEK solver, but does not provide a specific link or explicit statement about releasing the source code for their own methodology. |
| Open Datasets | Yes | We test our theoretical findings on simulated data and imbalanced CIFAR10, MNIST and Fashion MNIST datasets. |
| Dataset Splits | Yes | For each dataset use in our tests (CIFAR10, MNIST and Fashion MNIST), we sample 5, 8, 13, 23, 38, 64, 107, 179, 299, 500 and 5, 100, 120, 140, 160, 180, 200, 220, 300, 500 samples per class for the exponential and modified profiles, respectively. ... In addition, we featurize the standard test sets of each dataset and use to test the learned predictors. |
| Hardware Specification | No | training is distributed across 4 GPUs. |
| Software Dependencies | Yes | We use Py Torch (Paszke et al., 2019) to run gradient descent... We find the MM, MA, CDT and LA predictors we consider by solving the corresponding margin maximization problems (defined in Section 2.2) using CVXPY (Diamond & Boyd, 2016) with the MOSEK solver (MOSEK Ap S, 2023). |
| Experiment Setup | Yes | Fine-tuning is performed using Py Torch (Paszke et al., 2019) while training is conducted for 1000 epochs, employing the SGD optimizer with a batch size of 128, no momentum, no weight decay, and gradient clipping with global norm threshold of 1. The learning rate is set to 1e-4 with cosine learning rate scheduler, and the training is distributed across 4 GPUs. |