reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FlowBench: A Large Scale Benchmark for Flow Simulation over Complex Geometries

Authors: Ronak Tali, Ali Rabeh, Cheng-Hau Yang, Mehdi Shadkhah, Samundra Karki, Abhisek Upadhyaya, Suriya Dhakshinamoorthy, Marjan Saadati, Soumik Sarkar, Adarsh Krishnamurthy, Chinmay Hegde, Aditya Balu, Baskar Ganapathysubramanian

DMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark the performance of several methods, including Fourier Neural Operators (FNO), Convolutional Neural Operators (CNO), Deep ONets, and recent foundational models. This dataset (here) will be a valuable resource for developing and evaluating AI-for-science approaches, specifically neural PDE solvers, that model complex fluid dynamics around 2D and 3D objects. Table 5: The mean squared errors of various neural operators trained on the 2D LDC dataset. All errors are reported on the testing dataset.
Researcher Affiliation	Academia	1Iowa State University, Ames, IA 50011, USA 2New York University, New York, NY 10012, USA {rtali arabeh chenghau mehdish samundra}@iastate.edu, EMAIL {snarayan marjansd soumiks adarsh}@iastate.edu, EMAIL {baditya baskarg}@iastate.edu
Pseudocode	No	The paper contains detailed mathematical formulations of the SBM for Navier-Stokes and Heat Transfer, including derivations in the appendix, but it does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	As a step toward reproducibility and ease of use, we have released an end-to-end tutorial in our follow-up work (Rabeh et al., 2024a) and an accompanying Git Hub repository available (here). We make our code, for select Neural Operators, publicly available.
Open Datasets	Yes	We provide our dataset on huggingface at https://huggingface.co/datasets/BGLab/Flow Bench/ tree/main as a benchmark for others interested in the development and evaluation of Sci ML models.
Dataset Splits	Yes	We recommend evaluating trained models on a held out dataset using the standard 80-20 random split of the prepared dataset.
Hardware Specification	Yes	All the aforementioned models were trained on a single A100 80GB GPU using the Adam optimizer with a learning rate of 10 3 and were run for 400 epochs. We gratefully acknowledge support from the NAIRR pilot program for computational access to TACC Frontera.
Software Dependencies	No	The paper mentions using the Adam optimizer and the Petsc linear algebra package, and that data is stored in numpy compressed files, but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	All the aforementioned models were trained on a single A100 80GB GPU using the Adam optimizer with a learning rate of 10 3 and were run for 400 epochs.