On the Benefits of Memory for Modeling Time-Dependent PDEs

Authors: Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu, Andrej Risteski

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that when the PDEs are supplied in low resolution or contain observation noise at train and test time, Mem NO significantly outperforms the baselines without memory with up to 6 reduction in test error. Furthermore, we show that this benefit is particularly pronounced when the PDE solutions have significant high-frequency Fourier modes (e.g., low-viscosity fluid dynamics) and we construct a challenging benchmark dataset consisting of such PDEs.
Researcher Affiliation Collaboration Ricardo Buitrago Ruiz1,2, Tanya Marwah1, Albert Gu1,2, Andrej Risteski1 1Carnegie Mellon University 2Cartesia AI
Pseudocode Yes M MEMNO PSEUDOCODE In Figure 10 we provide pseudocode for the Mem NO framework (Section 5.3) in Py Torch (Paszke et al., 2019).
Open Source Code Yes We provide a forked repository with these changes in https://github. com/r-buitrago/LPSDA, which is based on the original repository of Brandstetter et al. (2022) https://github.com/brandstetter-johannes/LPSDA.
Open Datasets Yes We used the publicly available dataset of the Burgers equation in the PDEBench repository (Takamoto et al., 2023) with viscosity 0.001, which is available at resolution 1024.
Dataset Splits Yes In the experiments in Section 6.1, for each of the four values of the viscosity (0.15, 0.125, 0.1, 0.075), we generated a dataset with spatial resolution 512 with 2048 training samples and 256 test samples. For the Burgers equation, we take the publicly available Burgers dataset of PDEBench (Takamoto et al., 2023) with viscosity 0.001. Out of the 10000 samples of the dataset, we use 10% for testing. For training, we found it sufficient to use 2048 samples.
Hardware Specification Yes We ran our experiments on A6000/A6000-Ada GPUs.
Software Dependencies Yes This method is implemented in the diff method of the scipy.fftpack package (Virtanen et al., 2020).
Experiment Setup Yes All our experiments used a learning rate of 0.001. For the number of epochs, in KS and Burgers, the training was done over 200 epochs with cosine annealing learning scheduling (Loshchilov & Hutter, 2017); whereas in Navier Stokes we trained for 300 epochs and halved the learning rate every 90. As for the number of samples, KS and Burgers were trained with 2048 samples and Navier Stokes with 1024 samples. Lastly, we observed that the batch size was a sensitive hyperparameter for both the memory and memoryless models (it seemed to affect both equally) so we ran a sweep at each experiment to select the best performing one. In the results shown in the paper, KS and Navier Stokes use a batch size of 32, and Burgers a batch size of 64.