reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Context Flows for Meta-Learning of Dynamical Systems

Authors: Roussel Desmond Nzoyem, David Barton, Tom Deakin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that NCF achieves state-of-the-art Out-of-Distribution performance on 5 out of 6 linear and non-linear benchmark problems. Through extensive experiments, we explore the flexible model architecture of NCF and the encoded representations within the learned context vectors. Our findings highlight the potential implications of NCF for foundational models in the physical sciences, offering a promising approach to improving the adaptability and generalization of NODEs in various scientific applications.
Researcher Affiliation	Academia	Roussel Desmond Nzoyem School of Computer Science University of Bristol EMAIL David A.W. Barton School of Engineering Mathematics and Technology University of Bristol EMAIL Tom Deakin School of Computer Science University of Bristol EMAIL
Pseudocode	Yes	Algorithm 1 Proximal Alternating Minimization Algorithm 2 Sequential Adaptation of NCF
Open Source Code	Yes	ETHICS STATEMENT While the benefits of NCFs are evidenced in Section 5.1, its negative impacts should not be neglected. For instance, malicious deployment of such adaptable models in scenarios they were not designed for could lead to serious adverse outcomes. With that in mind, our code, data, and models are openly available at https://github.com/ddrous/ncflow. E EXAMPLE IMPLEMENTATION OF NCFS A highly performant JAX implementation (Bradbury et al., 2018) of our algorithms is available at https://github.com/ddrous/ncflow 14. We provide below a few central pieces of our codebase using the ever-growing JAX ecosystem, in particular Optax (Deep Mind et al., 2020) for optimization, and Equinox (Kidger & Garcia, 2021) for neural network definition. 14A Py Torch codebase is equally made available at https://github.com/ddrous/ncflow-torch.
Open Datasets	Yes	This need for shared datasets and APIs has motivated the Gen-Dynamics open-source initiative, our third and final contribution with this paper. Further details, along with the data generation process, are given in Appendix B. B.1 GEN-DYNAMICS Given the lack of benchmark consistency in Scientific Machine Learning (Massaroli et al., 2020), we launched Gen-Dynamics: https://github.com/ddrous/gen-dynamics. This is a call for fellow authors to upload their metrics and datasets, synthetic or otherwise, while following a consistent interface.
Dataset Splits	Yes	In the context of Oo D generalization for instance, we suggest the dataset be split in 4 parts: (1) train: For In-Domain meta-training; (2) test: For In-Domain evaluation; (3) ood train: For Out-of-Distribution adaptation to new environments (meta-testing); (4) ood test: For Oo D evaluation. For LV, GO, GS, and NS, we reproduce the original guidelines set in (Kirchmeyer et al., 2022), while exposing the data for ODE and PDE problems alike via a common interface. This experiment explores the SP problem discussed in Fig. 1. During meta-training, we use 25 environments with the gravity g regularly spaced in [2, 24]. Each of these environments contains only 4 trajectories with the initial conditions xe i(0) U( π 3 ), U( 1, 1) T . During adaptation, we interpolate to 2 new environments with g {10.25, 14.75}, each with 1 trajectory. For both training and adaptation testing scenarios, we generate 32 separate trajectories.
Hardware Specification	Yes	Our main workstation for training and adaptation was fitted with an Nvidia Ge Force RTX 4080 graphics card which was used for the SP, LV, SM, and NS problems. Additionally, we used an RTX 3090 GPU of the same generation for the GO problem, and an NVIDIA A100 Tensor Core GPU for the GS problem as its CNN-based architecture was more memory-intensive.
Software Dependencies	No	In addition to the training hardware and deep learning frameworks (JAX for NCFs, and Py Torch for CAVIA and Co DA) that made apple-to-apple comparison challenging, our network architectures varied slightly across methods, as the subsections below highlight. Using Torch Diff Eq (Chen, 2018), Co DA backpropagates gradients through the numerical integrator. Importantly, we incorporated the Torch Diff Eq (Chen, 2018) open-source package and we adjusted other hyperparameters accordingly to match NCF and Co DA on parameter count and other relevant aspects for fair comparison.
Experiment Setup	Yes	We use the 3-networks architecture depicted in Fig. 2b to suitably process the state and context variables. The dimension of the context vector, the context pool s size and filling strategy, and the numerical integration scheme vary across problems. For instance, we set dξ = 1024 for LV, dξ = 202 for NS, and dξ = 256 for all other problems; while p = 2 for LV and SM, p = 4 for LV and GO, and p = 3 for all PDE problems. Other hyperparameters are carefully discussed in Appendix D. For regularization of the loss function Eq. (10), we set λ1 = 10 3 for all problems; but λ2 = 0 for ODE problems and λ2 = 10 3 for PDE problems. For NCF-t2 we always used a proximal coefficient β = 10 except for the LV where β = 100. Their initial learning rates were as follows: 3 10 4 for LV and NS, 10 3 for GO, BT, and GS, 10 4 for SM. That learning rate was kept constant throughout the various trainings, except for BT, GS, and NS where it was multiplied by a factor (0.1, 0.5, and 0.1 respectively) after a third of the total number of training steps, and again by the same factor at two-thirds completion. The same initial learning rates were used during adaptation, with the number of iterations typically set to 1500. Table 8: Hyperparameters for NCF-t1 Table 9: Hyperparameters for NCF-t2