reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zebra: In-Context Generative Pretraining for Solving Parametric PDEs

Authors: Louis Serrano, Armand Kassaı̈ Koupaı̈, Thomas X Wang, Pierre Erbacher, Patrick Gallinari

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.
Researcher Affiliation	Collaboration	1Sorbonne Universit e, CNRS, ISIR, 75005 Paris, France 2Naver Labs Europe, France 3Crit eo AI Lab, Paris, France.
Pseudocode	No	The paper describes the framework and methods using text and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available on Git Hub.
Open Datasets	No	We plan to release the code, the weights of the models, and the datasets used in this study upon acceptance.
Dataset Splits	Yes	For testing, all methods are evaluated on trajectories with new initial conditions in previously unseen environments. These unseen environments include trajectories with both novel initial conditions and varying parameters, which remain within the training distribution for in-distribution evaluation and extend beyond it for outof-distribution testing. For each testing, we use 120 unseen environments for the 2D datasets and 12 for the 1D datasets, with each environment containing 10 trajectories.
Hardware Specification	Yes	Regarding computational resources, training the VQ-VAE in 1D takes approximately 4 hours on an RTX 24 GB GPU, while the transformer component requires around 15 hours. In the 2D setting, both training times increase to approximately 20 hours each on a single A100 80 GB GPU.
Software Dependencies	No	The paper mentions various architectures (e.g., Llama, U-Net, FNO) and frameworks (Hugging Face) but does not provide specific version numbers for software libraries (e.g., Python, PyTorch, TensorFlow) or specific solver versions, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	Table 7. Hyperparameters for Zebra s Transformer Hyperparameters Advection Heat Burgers Wave b Combined Vorticity 2D Wave 2D max context size 2048 2048 2048 2048 2048 8192 8192 batch size 4 4 4 4 4 2 2 num gradient accumulations 1 1 1 1 1 4 4 hidden size 256 256 256 256 256 384 512 mlp ratio 4.0 4.0 4.0 4.0 4.0 4.0 4.0 depth 8 8 8 8 8 8 8 num heads 8 8 8 8 8 8 8 vocabulary size 264 264 264 264 264 2056 2056 start learning rate 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 weight decay 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 scheduler Cosine Cosine Cosine Cosine Cosine Cosine Cosine num epochs 100 100 100 100 100 30 30