reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation

Authors: Prashansa Panda, Shalabh Bhatnagar

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also show the results of numerical experiments on three benchmark settings and observe that our critic-actor algorithm performs the best amongst all algorithms. (...) Finally, we show numerical performance comparisons of our algorithm with the AC and a few other algorithms over three different Open AI Gym environments and observe that the CA algorithm shows the best performance amongst all algorithms considered, though by small margins.
Researcher Affiliation	Academia	Indian Institute of Science, Bengaluru, India. EMAIL; EMAIL
Pseudocode	Yes	Algorithm 1: Two Timescale Critic-Actor Algorithm
Open Source Code	Yes	1The code for all of our experiments is available at https://github.com/prashu1306/Critic-Actor.
Open Datasets	Yes	We present here the results of experiments on three different (open source) Open AI Gym environments namely Frozen Lake, Pendulum and Mountain Car Continuous, respectively, over which we compare the performance of CA with AC as well as the Deep Q-Network (DQN) (Mnih et al. 2015) in the average reward setting, and PPO (Schulman et al. 2017). Detailed descriptions of these environments can be found at https://gymnasium.farama.org/.
Dataset Splits	No	The paper mentions 'training the agent for 10,000 steps' and 'averaged over 10 different initial seeds' within Open AI Gym environments. However, it does not specify explicit dataset split percentages, sample counts for distinct train/validation/test sets, or detailed methodologies for partitioning interaction data in a way directly comparable to supervised learning dataset splits.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU/CPU models, processor types, or memory.
Software Dependencies	No	The paper mentions 'Open AI Gym environments', 'DQN', and 'PPO' as software frameworks/algorithms used. However, it does not provide specific version numbers for these or any other ancillary software components (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment.
Experiment Setup	No	The paper states that experiments are 'averaged over 10 different initial seeds after training the agent for 10,000 steps' and mentions theoretical step-size relationships (e.g., 'αt = cα/(1 + t)ν, βt = cβ/(1 + t)σ, γt = cγ/(1 + t)ν'). However, it does not provide concrete hyperparameter values (e.g., specific cα, cβ, cγ, ν, σ values used in experiments), batch sizes, network architectures, or optimizer settings for the experimental runs.