Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

On the Benefits of Active Data Collection in Operator Learning

Authors: Unique Subedi, Ambuj Tewari

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct numerical studies comparing our active data collection strategy with passive data collection (random sampling) for learning solution operators for the Poisson and Heat Equations. Figures (1) and (2) show the testing error as a function of the training sample size.
Researcher Affiliation Academia 1Department of Statistics, University of Michigan, Ann Arbor, USA. Correspondence to: Unique Subedi <EMAIL>.
Pseudocode No The paper describes the data collection strategy and estimator in Section 3.1 and Appendix A.1 but does not present it in a structured pseudocode or algorithm block.
Open Source Code Yes Our code is available at https://github.com/ unique-subedi/active-operator-learning.
Open Datasets No For the passive data collection strategy, the input functions f are independently sampled as f GP(0, 502(∇^2+I)^-2), where GP denotes Gaussian Process... For each initial condition, we use the finite difference method with forward-time discretization to compute the solution u1 at t = 1. This is not a publicly available dataset with a link or citation, but a description of how synthetic data is generated.
Dataset Splits Yes For testing, 100 additional source functions f GP(0, 502(∇^2 + I)^-2) are generated... All estimators are evaluated on a test set of size 100, drawn from the same distribution as the training data.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, mentioning only the grid size for computations (e.g., '64x64 grid') but no specific processor or GPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions using 'Fourier Neural Operator (FNO)' and 'finite-difference method' but without version details.
Experiment Setup Yes The FNO model has four Fourier layers and N/2 Fourier modes, where N denotes the number of grid points along each spatial dimension. In our experiments, all computations are carried out on a 64x64 grid, so N = 64... This is done using 1000 time discretization steps on a 64x64 grid. For our experiments, we set τ = 10^-2.