FlickerFusion: Intra-trajectory Domain Generalizing Multi-agent Reinforcement Learning

Authors: Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FLICKERFUSION, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods... The results show that FLICKERFUSION not only achieves superior inference rewards but also uniquely reduces uncertainty vis-à-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings. Table 1 presents the overall results of final mean rewards. FLICKERFUSION ranks first in 10 out of 12 benchmarks...
Researcher Affiliation Academia 1Yonsei University, EMAIL 2KAIST AI, EMAIL
Pseudocode Yes The pseudocodes for train and inference modes are presented in Alg. 1 and 2. Algorithm 1: FLICKERFUSION (Train) Algorithm 2: FLICKERFUSION (Inference)
Open Source Code Yes Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
Open Datasets Yes To ensure that this problem setting does not fall prey to the standardization issue raised by Papoudakis et al. (2021), we enhance Multi Particle Environments (MPE; Lowe et al. (2017)) to MPEV2, consisting of 12 open-sourced benchmarks. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
Dataset Splits Yes As visualized in Fig. 3a, each environment has two benchmarks, OOD1 and OOD2, which assess generalization performance by increasing the number of parameterized agents and non-agent entities, respectively. Fine-grained details for each benchmark are presented in Appendix B. ... Table 5: Number of entities for each domain ... ninit a and ninit adv represents the initial quantities for agents and adversaries, respectively. nintra a and nintra adv denotes the number of entities added intra-trajectory. These variables are defined either as fixed values or as ranges from which the number of entities can be randomly sampled.
Hardware Specification Yes When training 3 million steps, on an RTX 3090 24GB and 15 core machine, QMIX-MLP FLICKERFUSION-MLP and QMIX-Attention FLICKERFUSION-Attention only incurs an additional 4646.4s 4838.3s (+4.1%) and 5560.5s 6066.1s (+9.1%) run-time cost, respectively.
Software Dependencies No The paper refers to various MARL backbone methods and model-agnostic domain generalization methods (e.g., QMIX-MLP, QMIX-Attention, MLDG, SMLDG, DG-MAML, Meta Dot Prod, UPDe T, REFIL, CAMA, ODIS, ACORM) but does not provide specific version numbers for any general software dependencies or libraries used for implementation.
Experiment Setup Yes For fair comparison, the same degree-of-freedom and sample size are used for hyperparameter tuning on each benchmark, method reported in Appendix C. Overlapping hyperparameters that may significantly influence performance are equalized across methods. Each seed is trained for 3 million steps, and the 8-sample mean inference reward curve is recorded. ... In the tables below are the hyperparameters used for MLP backbone models (Table 22) and attention backbone models (Table 23). ... Table 24: CAMA Hyperparameters ... Table 25: UPDe T Hyperparameters ... Table 26: ODIS Hyperparameters