When to retrain a machine learning model

Authors: Florence Regol, Leo Schwinn, Kyle Sprague, Mark Coates, Thomas Markovich

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments addressing classification tasks show that the method consistently outperforms existing baselines on 7 datasets. ... 5. Experiments Evaluation Metrics The performance of a retraining decision method is evaluated based on both the average performance and the total retraining cost. ... Table 1. AUC of the combined performance/retraining cost metric Cα(θ), computed over a range of α values, for all datasets. ... Ablation study: Importance of uncertainty ... Sensitivity study: Robustness to wrong α
Researcher Affiliation Collaboration 1Mc Gill University, Canada 2Block, Toronto, Canada 3Technical University of Munich, Germany. Correspondence to: Florence Regol <EMAIL>.
Pseudocode No The paper describes the methodology using mathematical equations and descriptive text, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the methodology or a link to a code repository.
Open Datasets Yes We present results on synthetic and real datasets. For the real datasets, we use datasets with a timestamp for each sample and partition the data in time to create a sequence of datasets D0,D1,.... ... (ii) the airplane dataset (Gomes et al., 2017), ... (iii) yelp CHI (Dou et al., 2020), ... and (iv) epicgames (Ozmen et al., 2024), ... i Wild Cam (Beery et al., 2020) ... For the synthetic dataset, we follow Mahadevan & Mathioudakis (2024) to generate two 2D datasets with covariate shift (Gauss) and concept drift (circles) (Pesaranghader et al., 2016).
Dataset Splits Yes For the real datasets, we use datasets with a timestamp for each sample and partition the data in time to create a sequence of datasets D0,D1,.... For each trial, we sample a different sequence of length w + T within the complete dataset sequence available. ... We use a similar setup to the one followed in our experiment, setting the offline window size w = 7, evaluating over an online phase of T = 8 steps, and presenting results over 10 trials (See table 11).
Hardware Specification Yes Our architecture involves using a pretrained vision model, ... Training was conducted using 4 H100 GPUs for 2 days.
Software Dependencies No For µϕ(ri,j), we use a linear regression model, Elastic Net CV (Zou & Hastie, 2005), from the scikit-learn library. All other optimization parameters are set to default choices from the scikit learn libraries. ... We follow Mahadevan & Mathioudakis (2024) and use the Sklearn Multiflow library version (Montiel et al., 2018) of the airplane dataset. ... pretrained vision models made available from timm
Experiment Setup Yes We set the confidence threshold of our UPF algorithm to δ = 95%, as it is a standard value used for confidence intervals. For µϕ(ri,j), we use a linear regression model, Elastic Net CV (Zou & Hastie, 2005), from the scikit-learn library. All other optimization parameters are set to default choices from the scikit learn libraries. ... The fine-tuning process uses the Adam optimizer with a fixed learning rate of 10^-4 and a weight decay parameter of 10^-5.