reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification

Authors: Ben Dai

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The numerical effectiveness of ENSLOSS compared to fixed loss methods is demonstrated through experiments on a broad range of 45 pairs of CIFAR10 datasets, the PCam image dataset, and 14 Open ML tabular datasets and with various deep learning architectures. Python repository and source code are available on GITHUB.
Researcher Affiliation	Academia	1Department of Statistics, The Chinese University of Hong Kong. Correspondence to: Ben Dai <EMAIL>.
Pseudocode	Yes	Algorithm 1 (Minibatch) Calibrated ensemble SGD. and Algorithm 2 Inverse Box-Cox transform of loss-derivatives.
Open Source Code	Yes	Python repository and source code are available on GITHUB. All Python codes is openly accessible at our GITHUB.
Open Datasets	Yes	Image datasets. We present the empirical results for image benchmark datasets: the CIFAR10 (Krizhevsky et al., 2009) and the Patch Camelyon (PCam; (Veeling et al., 2018))... Tabular datasets. We applied a filtering (n ≥ 1000, d ≤ 1000) across all Open ML (Vanschoren et al., 2014) binary classification dense datasets...
Dataset Splits	No	The paper frequently refers to 'training' and 'testing' datasets, such as 'significant gap often persisting between the training (close to zero) and testing errors', but does not explicitly provide specific percentages, counts, or methodology for dataset splits (e.g., 80/10/10 split, or specific sample counts for train/validation/test sets) for the datasets used.
Hardware Specification	No	The paper does not explicitly mention specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It refers to 'deep learning models' and 'neural network architectures' in a general sense.
Software Dependencies	No	The paper states 'All Python codes is openly accessible at our GITHUB.', indicating Python is used. However, it does not provide specific version numbers for Python or any other libraries, frameworks, or solvers utilized in the implementation.
Experiment Setup	No	The paper mentions that 'The implementation settings for each method are identical', and refers to a 'learning rate γ' and a 'minibatch size B' in Algorithm 1, as well as a hyperparameter 'λ = 0' for the Box-Cox transformation. Table 8 shows 'minimum epochs required for training accuracy to stabilize'. However, it does not provide specific numerical values for critical hyperparameters such as the learning rate, the exact batch size used for the experiments, or the total number of epochs for the main results presented in Section 4.