Continuously Updating Digital Twins using Large Language Models
Authors: Harry Amad, Nicolás Astorga, Mihaela Van Der Schaar
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now demonstrate the empirical performance of CALM-DT. Firstly, we examine simulations in fixed modelling environments, demonstrating state-of-the-art performance (6.1). We also conduct ablation studies to assess the contribution of different components of CALM-DT (6.2). We then showcase CALM-DT s unique ability to adapt to changes in modelling environment without re-design or re-training, demonstrating adaptation to a novel action (6.3), and incorporation of new data (6.4). |
| Researcher Affiliation | Academia | 1Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom. Correspondence to: Harry Amad <EMAIL>. |
| Pseudocode | No | The paper describes the methodology of CALM-DT in Section 4, detailing an iterative three-stage process: information retrieval, prompt formulation, and generation. Figure 2 provides a visual overview. However, there are no explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured code-like steps. |
| Open Source Code | No | The paper does not provide concrete access to source code for the CALM-DT methodology. It mentions the GitHub repository for a benchmark method (HDTwin) from Holt et al. (2024), but not for the authors' own work. |
| Open Datasets | Yes | For the CF setting, we use 1000 trajectories from the 2008-2013 UK CF registry for training... Since the CF data is not publicly accessible... We split the Hare-Lynx dataset... using the datasets from Bonnaff e & Coulson (2023). We split the Algae-Flagellate-Rotifer dataset... using the datasets from Bonnaff e & Coulson (2023). |
| Dataset Splits | Yes | For the CF setting, we use 1000 trajectories from the 2008-2013 UK CF registry for training, and assess three-year simulation performance. For the NSCLC setting, we generate 500 training samples... We generate validation and testing sets of 100 patients each. We split the Hare-Lynx dataset into nine samples of 10 years each, and we set the first six samples as the training set, and use last three samples as the testing set... We split the Algae-Flagellate-Rotifer dataset into 10 samples of 10 days each, and we set the first six samples as the training set, and use last four samples as the testing set. |
| Hardware Specification | No | The paper mentions using GPT-4o, GPT-4o Mini, or GPT-3.5 Turbo via the Azure Open AI Service. While these are specific models/services, they do not specify the underlying hardware (e.g., GPU models, CPU types) used to run the experiments. |
| Software Dependencies | Yes | For CALM-DT, we use GPT-4o, accessed via the Azure Open AI Service with version 2024-02-01... GPT-4o mini (version 2024-10-01-preview), or GPT-3.5 Turbo (version 2024-10-01-preview), all accessed via the Azure Open AI service. |
| Experiment Setup | Yes | For CALM-DT, we use GPT-4o... with the temperature τ = 0, and we set Kτ as:... We also set r = 1, l = 3, F = 3, c = 5... We conduct training for 8 epochs with a batch size of 16, learning rate of 5 10 5, and temperature of τ = 0.07, using the Adam W optimizer (Kingma & Ba, 2015) as implemented in Py Torch (Paszke et al., 2019)... Dy NODE with a 3-layer MLP, with a hidden dimension of 128, with tanh activation functions, and Xavier weight initialisation... learning rate of 0.01, batch size of 1,000 and early stopping with a patience of 20 for 2,000 epochs. |