Zebra: In-Context Generative Pretraining for Solving Parametric PDEs

Authors: Louis Serrano, Armand Kassaı̈ Koupaı̈, Thomas X Wang, Pierre Erbacher, Patrick Gallinari

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.
Researcher Affiliation Collaboration 1Sorbonne Universit e, CNRS, ISIR, 75005 Paris, France 2Naver Labs Europe, France 3Crit eo AI Lab, Paris, France.
Pseudocode No The paper describes the framework and methods using text and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is available on Git Hub.
Open Datasets No We plan to release the code, the weights of the models, and the datasets used in this study upon acceptance.
Dataset Splits Yes For testing, all methods are evaluated on trajectories with new initial conditions in previously unseen environments. These unseen environments include trajectories with both novel initial conditions and varying parameters, which remain within the training distribution for in-distribution evaluation and extend beyond it for outof-distribution testing. For each testing, we use 120 unseen environments for the 2D datasets and 12 for the 1D datasets, with each environment containing 10 trajectories.
Hardware Specification Yes Regarding computational resources, training the VQ-VAE in 1D takes approximately 4 hours on an RTX 24 GB GPU, while the transformer component requires around 15 hours. In the 2D setting, both training times increase to approximately 20 hours each on a single A100 80 GB GPU.
Software Dependencies No The paper mentions various architectures (e.g., Llama, U-Net, FNO) and frameworks (Hugging Face) but does not provide specific version numbers for software libraries (e.g., Python, PyTorch, TensorFlow) or specific solver versions, which are necessary for reproducible software dependencies.
Experiment Setup Yes Table 7. Hyperparameters for Zebra s Transformer Hyperparameters Advection Heat Burgers Wave b Combined Vorticity 2D Wave 2D max context size 2048 2048 2048 2048 2048 8192 8192 batch size 4 4 4 4 4 2 2 num gradient accumulations 1 1 1 1 1 4 4 hidden size 256 256 256 256 256 384 512 mlp ratio 4.0 4.0 4.0 4.0 4.0 4.0 4.0 depth 8 8 8 8 8 8 8 num heads 8 8 8 8 8 8 8 vocabulary size 264 264 264 264 264 2056 2056 start learning rate 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 weight decay 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 scheduler Cosine Cosine Cosine Cosine Cosine Cosine Cosine num epochs 100 100 100 100 100 30 30