Zebra: In-Context Generative Pretraining for Solving Parametric PDEs
Authors: Louis Serrano, Armand Kassaı̈ Koupaı̈, Thomas X Wang, Pierre Erbacher, Patrick Gallinari
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches. |
| Researcher Affiliation | Collaboration | 1Sorbonne Universit e, CNRS, ISIR, 75005 Paris, France 2Naver Labs Europe, France 3Crit eo AI Lab, Paris, France. |
| Pseudocode | No | The paper describes the framework and methods using text and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our code is available on Git Hub. |
| Open Datasets | No | We plan to release the code, the weights of the models, and the datasets used in this study upon acceptance. |
| Dataset Splits | Yes | For testing, all methods are evaluated on trajectories with new initial conditions in previously unseen environments. These unseen environments include trajectories with both novel initial conditions and varying parameters, which remain within the training distribution for in-distribution evaluation and extend beyond it for outof-distribution testing. For each testing, we use 120 unseen environments for the 2D datasets and 12 for the 1D datasets, with each environment containing 10 trajectories. |
| Hardware Specification | Yes | Regarding computational resources, training the VQ-VAE in 1D takes approximately 4 hours on an RTX 24 GB GPU, while the transformer component requires around 15 hours. In the 2D setting, both training times increase to approximately 20 hours each on a single A100 80 GB GPU. |
| Software Dependencies | No | The paper mentions various architectures (e.g., Llama, U-Net, FNO) and frameworks (Hugging Face) but does not provide specific version numbers for software libraries (e.g., Python, PyTorch, TensorFlow) or specific solver versions, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | Table 7. Hyperparameters for Zebra s Transformer Hyperparameters Advection Heat Burgers Wave b Combined Vorticity 2D Wave 2D max context size 2048 2048 2048 2048 2048 8192 8192 batch size 4 4 4 4 4 2 2 num gradient accumulations 1 1 1 1 1 4 4 hidden size 256 256 256 256 256 384 512 mlp ratio 4.0 4.0 4.0 4.0 4.0 4.0 4.0 depth 8 8 8 8 8 8 8 num heads 8 8 8 8 8 8 8 vocabulary size 264 264 264 264 264 2056 2056 start learning rate 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 weight decay 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 scheduler Cosine Cosine Cosine Cosine Cosine Cosine Cosine num epochs 100 100 100 100 100 30 30 |