reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LTL-Constrained Policy Optimization with Cycle Experience Replay

Authors: Ameesh Shah, Cameron Voloshin, Chenxi Yang, Abhinav Verma, Swarat Chaudhuri, Sanjit A. Seshia

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Cycl ER in three continuous control domains. Our experimental results show that optimizing Cycl ER in tandem with the existing scalar reward outperforms existing reward-shaping methods at finding performant LTL-satisfying policies.
Researcher Affiliation	Collaboration	Ameesh Shah EMAIL UC Berkeley Cameron Voloshin EMAIL Latitude AI Chenxi Yang EMAIL UT Austin Abhinav Verma EMAIL Penn State University Swarat Chaudhuri EMAIL UT Austin Sanjit A. Seshia EMAIL UC Berkeley
Pseudocode	Yes	Algorithm 1: Cycle Experience Replay (Cycl ER)
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of their code. It mentions using existing tools like PPO and the Spot tool, but not their own implementation code.
Open Datasets	Yes	We use the Zones environment from the Mu Jo Co-based Safety-Gymnasium suite of environments (Ji et al., 2023). We use the Buttons environment, also from Safety-Gymnasium.
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits, as it primarily uses reinforcement learning in simulation environments where data is generated through agent interaction rather than predefined static datasets.
Hardware Specification	Yes	All experiments were done on an Intel Core i9 processor with 10 cores equipped with an NVIDIA RTX A4500 GPU.
Software Dependencies	No	The paper mentions using "entropy-regularized PPO" and the "Adam optimizer" but does not provide specific version numbers for these software libraries. It also mentions the "Spot tool Duret-Lutz et al. (2022)" but without a specific version number used in their experiments.
Experiment Setup	Yes	We provide hyperparameter choices for PPO for each experiment in Table 6 and choices for λ in Table 4. In Table 6, batch size refers to the number of trajectories. In our PPO implementation, we use a 3-layer, 64-hidden unit network as the actor using ReLU activations, and a 3-layer, 64-hidden unit network architecture with tanh activations in between layers and no final activation function for the critic.