reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FlickerFusion: Intra-trajectory Domain Generalizing Multi-agent Reinforcement Learning

Authors: Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FLICKERFUSION, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods... The results show that FLICKERFUSION not only achieves superior inference rewards but also uniquely reduces uncertainty vis-à-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings. Table 1 presents the overall results of final mean rewards. FLICKERFUSION ranks first in 10 out of 12 benchmarks...
Researcher Affiliation	Academia	1Yonsei University, EMAIL 2KAIST AI, EMAIL
Pseudocode	Yes	The pseudocodes for train and inference modes are presented in Alg. 1 and 2. Algorithm 1: FLICKERFUSION (Train) Algorithm 2: FLICKERFUSION (Inference)
Open Source Code	Yes	Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
Open Datasets	Yes	To ensure that this problem setting does not fall prey to the standardization issue raised by Papoudakis et al. (2021), we enhance Multi Particle Environments (MPE; Lowe et al. (2017)) to MPEV2, consisting of 12 open-sourced benchmarks. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.
Dataset Splits	Yes	As visualized in Fig. 3a, each environment has two benchmarks, OOD1 and OOD2, which assess generalization performance by increasing the number of parameterized agents and non-agent entities, respectively. Fine-grained details for each benchmark are presented in Appendix B. ... Table 5: Number of entities for each domain ... ninit a and ninit adv represents the initial quantities for agents and adversaries, respectively. nintra a and nintra adv denotes the number of entities added intra-trajectory. These variables are defined either as fixed values or as ranges from which the number of entities can be randomly sampled.
Hardware Specification	Yes	When training 3 million steps, on an RTX 3090 24GB and 15 core machine, QMIX-MLP FLICKERFUSION-MLP and QMIX-Attention FLICKERFUSION-Attention only incurs an additional 4646.4s 4838.3s (+4.1%) and 5560.5s 6066.1s (+9.1%) run-time cost, respectively.
Software Dependencies	No	The paper refers to various MARL backbone methods and model-agnostic domain generalization methods (e.g., QMIX-MLP, QMIX-Attention, MLDG, SMLDG, DG-MAML, Meta Dot Prod, UPDe T, REFIL, CAMA, ODIS, ACORM) but does not provide specific version numbers for any general software dependencies or libraries used for implementation.
Experiment Setup	Yes	For fair comparison, the same degree-of-freedom and sample size are used for hyperparameter tuning on each benchmark, method reported in Appendix C. Overlapping hyperparameters that may significantly influence performance are equalized across methods. Each seed is trained for 3 million steps, and the 8-sample mean inference reward curve is recorded. ... In the tables below are the hyperparameters used for MLP backbone models (Table 22) and attention backbone models (Table 23). ... Table 24: CAMA Hyperparameters ... Table 25: UPDe T Hyperparameters ... Table 26: ODIS Hyperparameters