Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation
Authors: Prashansa Panda, Shalabh Bhatnagar
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also show the results of numerical experiments on three benchmark settings and observe that our critic-actor algorithm performs the best amongst all algorithms. (...) Finally, we show numerical performance comparisons of our algorithm with the AC and a few other algorithms over three different Open AI Gym environments and observe that the CA algorithm shows the best performance amongst all algorithms considered, though by small margins. |
| Researcher Affiliation | Academia | Indian Institute of Science, Bengaluru, India. EMAIL; EMAIL |
| Pseudocode | Yes | Algorithm 1: Two Timescale Critic-Actor Algorithm |
| Open Source Code | Yes | 1The code for all of our experiments is available at https://github.com/prashu1306/Critic-Actor. |
| Open Datasets | Yes | We present here the results of experiments on three different (open source) Open AI Gym environments namely Frozen Lake, Pendulum and Mountain Car Continuous, respectively, over which we compare the performance of CA with AC as well as the Deep Q-Network (DQN) (Mnih et al. 2015) in the average reward setting, and PPO (Schulman et al. 2017). Detailed descriptions of these environments can be found at https://gymnasium.farama.org/. |
| Dataset Splits | No | The paper mentions 'training the agent for 10,000 steps' and 'averaged over 10 different initial seeds' within Open AI Gym environments. However, it does not specify explicit dataset split percentages, sample counts for distinct train/validation/test sets, or detailed methodologies for partitioning interaction data in a way directly comparable to supervised learning dataset splits. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU/CPU models, processor types, or memory. |
| Software Dependencies | No | The paper mentions 'Open AI Gym environments', 'DQN', and 'PPO' as software frameworks/algorithms used. However, it does not provide specific version numbers for these or any other ancillary software components (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment. |
| Experiment Setup | No | The paper states that experiments are 'averaged over 10 different initial seeds after training the agent for 10,000 steps' and mentions theoretical step-size relationships (e.g., 'αt = cα/(1 + t)ν, βt = cβ/(1 + t)σ, γt = cγ/(1 + t)ν'). However, it does not provide concrete hyperparameter values (e.g., specific cα, cβ, cγ, ν, σ values used in experiments), batch sizes, network architectures, or optimizer settings for the experimental runs. |