When to retrain a machine learning model
Authors: Florence Regol, Leo Schwinn, Kyle Sprague, Mark Coates, Thomas Markovich
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments addressing classification tasks show that the method consistently outperforms existing baselines on 7 datasets. ... 5. Experiments Evaluation Metrics The performance of a retraining decision method is evaluated based on both the average performance and the total retraining cost. ... Table 1. AUC of the combined performance/retraining cost metric Cα(θ), computed over a range of α values, for all datasets. ... Ablation study: Importance of uncertainty ... Sensitivity study: Robustness to wrong α |
| Researcher Affiliation | Collaboration | 1Mc Gill University, Canada 2Block, Toronto, Canada 3Technical University of Munich, Germany. Correspondence to: Florence Regol <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using mathematical equations and descriptive text, but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | We present results on synthetic and real datasets. For the real datasets, we use datasets with a timestamp for each sample and partition the data in time to create a sequence of datasets D0,D1,.... ... (ii) the airplane dataset (Gomes et al., 2017), ... (iii) yelp CHI (Dou et al., 2020), ... and (iv) epicgames (Ozmen et al., 2024), ... i Wild Cam (Beery et al., 2020) ... For the synthetic dataset, we follow Mahadevan & Mathioudakis (2024) to generate two 2D datasets with covariate shift (Gauss) and concept drift (circles) (Pesaranghader et al., 2016). |
| Dataset Splits | Yes | For the real datasets, we use datasets with a timestamp for each sample and partition the data in time to create a sequence of datasets D0,D1,.... For each trial, we sample a different sequence of length w + T within the complete dataset sequence available. ... We use a similar setup to the one followed in our experiment, setting the offline window size w = 7, evaluating over an online phase of T = 8 steps, and presenting results over 10 trials (See table 11). |
| Hardware Specification | Yes | Our architecture involves using a pretrained vision model, ... Training was conducted using 4 H100 GPUs for 2 days. |
| Software Dependencies | No | For µϕ(ri,j), we use a linear regression model, Elastic Net CV (Zou & Hastie, 2005), from the scikit-learn library. All other optimization parameters are set to default choices from the scikit learn libraries. ... We follow Mahadevan & Mathioudakis (2024) and use the Sklearn Multiflow library version (Montiel et al., 2018) of the airplane dataset. ... pretrained vision models made available from timm |
| Experiment Setup | Yes | We set the confidence threshold of our UPF algorithm to δ = 95%, as it is a standard value used for confidence intervals. For µϕ(ri,j), we use a linear regression model, Elastic Net CV (Zou & Hastie, 2005), from the scikit-learn library. All other optimization parameters are set to default choices from the scikit learn libraries. ... The fine-tuning process uses the Adam optimizer with a fixed learning rate of 10^-4 and a weight decay parameter of 10^-5. |