reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Strategy Representation for Imitation Learning in Multi-Agent Games

Authors: Shiqi Lei, Kanghoon Lee, Linjing Li, Jinkyoo Park

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments. ... We evaluate our method across three environments to demonstrate the effectiveness of the learned strategy representation in STRIL using estimated indicators. ... In Table 1, we compared the WS of four types of data filtering methods. A hyperparameter search was conducted to identify the appropriate percentile, p, of indicators for each model and environment. Note that all experiments were repeated three times, and the results are reported with error bars.
Researcher Affiliation	Collaboration	1Institute of Automation, Chinese Academy of Sciences (CASIA) 2Korea Advanced Institute of Science and Technology (KAIST) 3Beijing Wenge Technology Co., Ltd.
Pseudocode	No	The paper describes methods using mathematical equations and diagrams (Figure 1 and 2), but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	No	The paper states: 'Dataset generation. We employ different methods to create training datasets with diverse demonstrators for the environments.' and 'Behavior models are then selected from multiple intermediate checkpoints to generate the offline data.' While the environments (Two-player Pong, Limit Texas Hold em, Connect Four) are based on existing platforms (RLCard, Pettingzoo), the specific offline datasets generated for the experiments are not explicitly stated to be publicly available, nor are any links or citations for these generated datasets provided.
Dataset Splits	No	The paper mentions 'We assume that only 5% of the dataset is reward-labeled for EL estimation.' This specifies a subset for reward-labeling, but it does not provide comprehensive training/test/validation splits for the main imitation learning experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms like 'Proximal Policy Optimization (PPO)' and 'Deep Q-network (DQN)' and frameworks like 'RLCard' and 'Pettingzoo', but it does not specify version numbers for any software libraries, programming languages, or specific tools used in the implementation.
Experiment Setup	No	The paper mentions 'A hyperparameter search was conducted to identify the appropriate percentile, p, of indicators for each model and environment.' and 'We set Ngame to 2,000.' It also states 'At the beginning of the training, the strategy representation l, which is a trainable variable, is randomly initialized for each trajectory τ.' and 'We use a two-layer MLP as L.' While these are some setup details, the paper lacks specific hyperparameter values (e.g., learning rate, batch size, number of epochs/iterations, specific optimizer settings) for the main IL algorithms (BC, IQ-Learn, ILEED) that would be needed for reproduction.