reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neurosymbolic World Models for Sequential Decision Making

Authors: Leonardo Hernandez Cano, Maxine Perroni-Scharf, Neil Dhir, Arun Ramamurthy, Armando Solar-Lezama

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantages of SWMPO by benchmarking its environment modeling capabilities in a number of simulation tasks. Our experiments aim to answer two research questions: (1) how effectively does SWMPO leverage offline data in the synthesis of an environment-specific FSM?; (2) is the resulting FSM accurate enough for model-based RL?
Researcher Affiliation	Collaboration	1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute Of Technology, Massachusetts, United States of America 2SIEMENS, New Jersey, United States of America.
Pseudocode	Yes	Algorithm 1 Neural Primitives, Algorithm 2 FSMSynth, Algorithm 3 SWMPO, Algorithm 4 greedy Prune
Open Source Code	Yes	Implementation details can be found at: https://gitlab. com/da_doomer/swmpo
Open Datasets	No	No explicit statement or link is provided for the publicly available datasets used. The paper states that offline data was gathered using controllers in various simulation environments (e.g., 'We use an MPC controller to gather offline data' for Point Mass, 'We use a pre-trained controller provided by the authors to gather offline data' for Li DAR Racing), implying data generation rather than use of pre-existing open datasets.
Dataset Splits	No	The paper mentions evaluating models on 'unseen test trajectories' and aggregating errors across 'four test trajectories for each of eight different terrains', but it does not provide specific details on the dataset splits (e.g., percentages for training, validation, and testing, or total sample counts for each split).
Hardware Specification	No	The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions several software packages, such as Pytorch, Scikit-learn, Stable Baselines3, HMM Learn, and SSM, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Appendix C provides detailed tables of hyperparameters for each simulation environment, including 'Table 2. Parameters for Point Mass', 'Table 3. Parameters for Autonomous Driving', 'Table 4. Parameters for Salamander', and 'Table 5. Parameters for Bipedal Walker', listing values for parameters such as hidden sizes, learning rate, and batch size.