reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Agent-Centric Actor-Critic for Asynchronous Multi-Agent Reinforcement Learning

Authors: Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Kanghoon Lee, Woohyung Lim

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show ACAC accelerates convergence and enhances performance over baselines in complex MARL tasks.
Researcher Affiliation	Industry	1LG AI Research, Seoul, Republic of Korea. Correspondence to: Woohyung Lim <EMAIL>.
Pseudocode	No	The paper describes the ACAC algorithm and its components, including the actor-critic algorithm and GAE modification, in Section 3. However, it does so using descriptive text and mathematical formulations rather than presenting a structured pseudocode block or a clearly labeled algorithm figure.
Open Source Code	Yes	The implementation code for this work can be found at https://github.com/LGAI-Research/acac.
Open Datasets	Yes	We evaluate our method on two collections of Mac Dec POMDP environments: Box Pushing and Overcooked (Xiao et al., 2020b; 2022). Adapted from Gym-Cooking (Wu et al., 2021), Overcooked environment involves three agents working together to prepare and deliver salads (e.g., tomato, onion).
Dataset Splits	No	The paper describes using reinforcement learning environments like Box Pushing and Overcooked, where agents interact with the environment over 'Environment steps' and 'Episodes per train'. It mentions running 'five random seeds' for experiments and randomizing agent/object positions in 'Overcooked-Rand' to increase uncertainty. However, it does not provide specific training/test/validation dataset splits in the conventional supervised learning sense, as these are simulated environments where data is generated through interaction.
Hardware Specification	Yes	We used AMD EPYC 7453 28-Core Processor and A10 for our experiments, and the running time of the proposed ACAC typically ranges from approximately 24 hours to 150 hours on Overcooked environments and from approximately 30 minutes to 2 hours on Box Pushing environments.
Software Dependencies	No	The paper mentions using 'transformer implementation from Hugging Face' and building upon 'gym-cooking environment (Wu et al., 2021)' and its 'macro-action version (Xiao et al., 2022)'. While specific tools and environments are named, no version numbers are provided for these software components or other key libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The detailed hyperparameters can be found in Appendix F. Table 1. Common hyper-parameters used across the environment collections Hyper-parameter Box Pushing Overcooked Overcooked-Rand Overcooked-Large(-Rand) Total training timesteps 500K 20M 40M 100M MLP Layer Size (Actor) [32, 32] [32, 32] [128, 64] RNN Layer Size (Actor) 32 32 64 MLP Layer Size (Critic) [32, 32] [128, 64] RNN Layer Size (Critic) 32 64 Discount factor γ 0.98 0.99 Episodes per train 16 8 Episodes per target critic update 64 32 Clipping ratio ϵ 0.05 0.01 Episodes per train 16 8 Max episode length 100 200 Learning Rate (Actor) 3e-4 Learning Rate (Critic) 3e-4 Minibatch size 8