reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Elliptic Loss Regularization

Authors: Ali Hasan, Haoming Yang, Yuting Ng, VAHID TAROKH

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments confirm the promise of the proposed regularization technique. We present empirical results over a comprehensive set of tasks related to the proposed regularization scheme. We first empirically evaluate the bound derived from Proposition 1 and verify that the proposed regularization retains the necessary loss within the domain of interest. Next, we benchmark the elliptic regularization against another popular regularization scheme known as mixup (Zhang et al., 2017) and its variants on balanced classification and in-distribution regression.
Researcher Affiliation	Collaboration	Ali Hasan1,2, , Haoming Yang2, , Yuting Ng2, Vahid Tarokh2 1 Machine Learning Research, Morgan Stanley 2 Department of Electrical and Computer Engineering, Duke University EMAIL,
Pseudocode	Yes	A ALGORITHM DETAILS To fully supplement the algorithmic contributions in the main paper, we detail the elliptic training procedure that solves the PDE of equation 1 using Brownian bridges in Algorithm 1. Algorithm 1 Elliptic training algorithm Algorithm 2 Sampling a Brownian bridge
Open Source Code	No	The paper discusses preprocessing pipelines from other works (c-mixup, UMIX, mixup E) and states their availability. However, it does not provide an explicit statement or link for the code of the methodology described in this paper ('Elliptic Loss Regularization').
Open Datasets	Yes	We benchmark on CIFAR-10 and CIFAR-100 with classification error rate in Table 1; we also evaluate on tiny-Imagenet 200 and report the top-1 and top-5 accuracy in Table 2. For regression, we benchmarked the elliptic regularization against c-mixup algorithm on two real world tabular datasets: Airfoil Self-Noise (Airfoil) (Brooks et al., 2014) and NO2 (Kooperberg, 1997); We benchmark elliptic training with importance weighting drift b = ℓ(f(x), y) in equation 5 on Water Birds (Koh et al., 2021) and Celeb A (Sagawa et al., 2019) We also evaluate on the Camelyon17 (Bandi et al., 2018) dataset to examine robustness under domain shifts. Communities and Crime (Crime) (Redmond, 2009) and Skill Craft1 Master Table (Skill Craft) Blair et al. (2013) are real-world tabular datasets where domain shifts exist between the training and testing data. We focus on the real-world medical dataset Med-MNIST (Yang et al., 2023a)
Dataset Splits	Yes	The Air Foil (Air Foil Self-Noise) dataset... there are 1003, 300, and 200 data instances for training, validation, and testing respectively. The NO2 dataset is included in Statlib (Kooperberg, 1997)... we split 200, 200, and 100 for training, validation, and testing respectively. Camelyon17 (Bandi et al., 2018)... We applied the official split scheme of this dataset. Skill Craft (Skill Craft1 Master)... We split 4,1,3 domains into training, validation, and testing subsets, which contain 1878, 806, 711 data instances, respectively. Crime (Communities And Crimes)... we split the dataset into training, validation, and test sets into subsets containing 31, 6, and 9 disjoint domains. There are 1,390, 231, and 373 instances in the training, validation, and testing subsets respectively. RCF-MNIST... We used an 80-20 split to split the training and testing data. Med-MNIST... An official data split of 70-10-20 is applied for each dataset to construct the training, validation, and testing subsets respectively.
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU models, or cloud computing instance types) used for conducting the experiments.
Software Dependencies	No	The paper mentions using 'Pytorch implementation of Res Net50' and 'Pytorch implementation of Dense Net121' but does not specify the version numbers for Pytorch or any other software dependencies.
Experiment Setup	Yes	Table 12: Architecture and Hyperparameter settings for each dataset/experiments. Optim stands for optimizer; Mom stands for momentum; WD stands for weight decay; and LR stands for learning rate. Note, * means the learning rate is annealed by a factor of 10 on epoch 100 and 150. For all experiments, parameter ξ = 1