reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Benefits of Memory for Modeling Time-Dependent PDEs

Authors: Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu, Andrej Risteski

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that when the PDEs are supplied in low resolution or contain observation noise at train and test time, Mem NO significantly outperforms the baselines without memory with up to 6 reduction in test error. Furthermore, we show that this benefit is particularly pronounced when the PDE solutions have significant high-frequency Fourier modes (e.g., low-viscosity fluid dynamics) and we construct a challenging benchmark dataset consisting of such PDEs.
Researcher Affiliation	Collaboration	Ricardo Buitrago Ruiz1,2, Tanya Marwah1, Albert Gu1,2, Andrej Risteski1 1Carnegie Mellon University 2Cartesia AI
Pseudocode	Yes	M MEMNO PSEUDOCODE In Figure 10 we provide pseudocode for the Mem NO framework (Section 5.3) in Py Torch (Paszke et al., 2019).
Open Source Code	Yes	We provide a forked repository with these changes in https://github. com/r-buitrago/LPSDA, which is based on the original repository of Brandstetter et al. (2022) https://github.com/brandstetter-johannes/LPSDA.
Open Datasets	Yes	We used the publicly available dataset of the Burgers equation in the PDEBench repository (Takamoto et al., 2023) with viscosity 0.001, which is available at resolution 1024.
Dataset Splits	Yes	In the experiments in Section 6.1, for each of the four values of the viscosity (0.15, 0.125, 0.1, 0.075), we generated a dataset with spatial resolution 512 with 2048 training samples and 256 test samples. For the Burgers equation, we take the publicly available Burgers dataset of PDEBench (Takamoto et al., 2023) with viscosity 0.001. Out of the 10000 samples of the dataset, we use 10% for testing. For training, we found it sufficient to use 2048 samples.
Hardware Specification	Yes	We ran our experiments on A6000/A6000-Ada GPUs.
Software Dependencies	Yes	This method is implemented in the diff method of the scipy.fftpack package (Virtanen et al., 2020).
Experiment Setup	Yes	All our experiments used a learning rate of 0.001. For the number of epochs, in KS and Burgers, the training was done over 200 epochs with cosine annealing learning scheduling (Loshchilov & Hutter, 2017); whereas in Navier Stokes we trained for 300 epochs and halved the learning rate every 90. As for the number of samples, KS and Burgers were trained with 2048 samples and Navier Stokes with 1024 samples. Lastly, we observed that the batch size was a sensitive hyperparameter for both the memory and memoryless models (it seemed to affect both equally) so we ran a sweep at each experiment to select the best performing one. In the results shown in the paper, KS and Navier Stokes use a batch size of 32, and Burgers a batch size of 64.