Agent-Centric Actor-Critic for Asynchronous Multi-Agent Reinforcement Learning
Authors: Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Kanghoon Lee, Woohyung Lim
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show ACAC accelerates convergence and enhances performance over baselines in complex MARL tasks. |
| Researcher Affiliation | Industry | 1LG AI Research, Seoul, Republic of Korea. Correspondence to: Woohyung Lim <EMAIL>. |
| Pseudocode | No | The paper describes the ACAC algorithm and its components, including the actor-critic algorithm and GAE modification, in Section 3. However, it does so using descriptive text and mathematical formulations rather than presenting a structured pseudocode block or a clearly labeled algorithm figure. |
| Open Source Code | Yes | The implementation code for this work can be found at https://github.com/LGAI-Research/acac. |
| Open Datasets | Yes | We evaluate our method on two collections of Mac Dec POMDP environments: Box Pushing and Overcooked (Xiao et al., 2020b; 2022). Adapted from Gym-Cooking (Wu et al., 2021), Overcooked environment involves three agents working together to prepare and deliver salads (e.g., tomato, onion). |
| Dataset Splits | No | The paper describes using reinforcement learning environments like Box Pushing and Overcooked, where agents interact with the environment over 'Environment steps' and 'Episodes per train'. It mentions running 'five random seeds' for experiments and randomizing agent/object positions in 'Overcooked-Rand' to increase uncertainty. However, it does not provide specific training/test/validation dataset splits in the conventional supervised learning sense, as these are simulated environments where data is generated through interaction. |
| Hardware Specification | Yes | We used AMD EPYC 7453 28-Core Processor and A10 for our experiments, and the running time of the proposed ACAC typically ranges from approximately 24 hours to 150 hours on Overcooked environments and from approximately 30 minutes to 2 hours on Box Pushing environments. |
| Software Dependencies | No | The paper mentions using 'transformer implementation from Hugging Face' and building upon 'gym-cooking environment (Wu et al., 2021)' and its 'macro-action version (Xiao et al., 2022)'. While specific tools and environments are named, no version numbers are provided for these software components or other key libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The detailed hyperparameters can be found in Appendix F. Table 1. Common hyper-parameters used across the environment collections Hyper-parameter Box Pushing Overcooked Overcooked-Rand Overcooked-Large(-Rand) Total training timesteps 500K 20M 40M 100M MLP Layer Size (Actor) [32, 32] [32, 32] [128, 64] RNN Layer Size (Actor) 32 32 64 MLP Layer Size (Critic) [32, 32] [128, 64] RNN Layer Size (Critic) 32 64 Discount factor γ 0.98 0.99 Episodes per train 16 8 Episodes per target critic update 64 32 Clipping ratio ϵ 0.05 0.01 Episodes per train 16 8 Max episode length 100 200 Learning Rate (Actor) 3e-4 Learning Rate (Critic) 3e-4 Minibatch size 8 |