reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Meta-Learning Adaptive Loss Functions

Authors: Christian Raymond, Qi Chen, Bing XUE, Mengjie Zhang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show that our proposed method consistently outperforms the crossentropy loss and offline loss function learning techniques on a diverse range of neural network architectures and datasets. In this section, the experimental setup for evaluating Ada LFL is presented. In summary experiments are conducted across seven open-access datasets and multiple well-established network architectures. The performance of Ada LFL is assessed against three benchmark methods.
Researcher Affiliation	Academia	Christian Raymond EMAIL Victoria University of Wellington Qi Chen EMAIL Victoria University of Wellington Bing Xue EMAIL Victoria University of Wellington Mengjie Zhang EMAIL Victoria University of Wellington
Pseudocode	Yes	Algorithm 1 Loss Function Initialization (Offline) Algorithm 2 Loss Function Adaptation (Online) Algorithm 3 Learning Rate Initialization (Offline) Algorithm 4 Learning Rate Adaptation (Online)
Open Source Code	Yes	All experiments are implemented in Py Torch (Paszke et al., 2017) and Higher (Grefenstette et al., 2019), and the code is available at 1. Git Hub Repository: https://github.com/Decadz/Online-Loss-Function-Learning
Open Datasets	Yes	Following the established literature on loss function learning, the regression datasets Communities and Crime (Redmond, 2009), Diabetes (Efron et al., 2004), and California Housing (Pace & Barry, 1997) are used as a simple domain to illustrate the capabilities of the proposed method. Following this classification datasets MNIST (Le Cun et al., 1998), CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011), are employed to assess the performance of Ada LFL to determine whether the results can generalize to larger, more challenging tasks.
Dataset Splits	Yes	The original training-testing partitioning is used for all datasets, with 10% of the training instances allocated for validation.
Hardware Specification	Yes	Table 2: Average run-time of the entire learning process (end-to-end) for each benchmark method. Each algorithm is run on a single Nvidia RTX A5000, and results are reported in hours.
Software Dependencies	No	The paper mentions "All experiments are implemented in Py Torch (Paszke et al., 2017) and Higher (Grefenstette et al., 2019)", but it does not specify explicit version numbers for PyTorch or Higher. The years in parentheses refer to the publication dates of related papers, not the software versions used.
Experiment Setup	Yes	In the inner loop, all regression models are trained using stochastic gradient descent (SGD) with a base learning rate of α = 0.001. Classification models are trained with SGD using a base learning rate of α = 0.01, and on CIFAR-10, CIFAR-100, and SVHN, Nesterov momentum 0.9 and weight decay 0.0005 are applied. ... To initialize Mϕ, Sinit = 2500 steps are taken in offline mode with a meta learning rate of η = 1e 3. In contrast, in online mode, a meta learning rate of η = 1e 5 is used (note, a high meta learning rate in online mode can cause a jittering effect in the loss function, which can cause training instability). For meta-optimization, the Adam optimizer (Kingma & Ba, 2015) is used in the outer loop for both initialization and online adaptation.