reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Communicate Through Implicit Communication Channels

Authors: Han Wang, Binbin Chen, zhang, Baoxiang Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of ICP through comprehensive experiments on the tasks of Guessing Numbers, Revealing Goals, and Hanabi (Bard et al., 2020).
Researcher Affiliation	Collaboration	Han Wang The Chinese University of Hong Kong, Shenzhen EMAIL Binbin Chen Byte Dance Inc. EMAIL Tieying Zhang Byte Dance Inc. EMAIL Baoxiang Wang The Chinese University of Hong Kong, Shenzhen Vector Institute EMAIL
Pseudocode	Yes	Algorithm 1: ICP implementation with DIAL and VDN
Open Source Code	Yes	The code and our designed environments are freely available in the supplementary material.
Open Datasets	Yes	We validate the effectiveness of ICP through comprehensive experiments on the tasks of Guessing Numbers, Revealing Goals, and Hanabi (Bard et al., 2020). These environments share a common characteristic: they lack direct communication but agents must collaboratively make decisions to achieve shared rewards. This setting introduces significant challenges, including sparse and delayed reward feedback, along with difficulty in credit assignment both temporally and among agents. Despite these hurdles, our experiments on Guessing Numbers and Revealing Goals demonstrate that ICP significantly enhances performance, through more efficient information transmission, compared to baseline methods. In Hanabi, which is a popular card game played by humans, our approach achieved an average score of 24.91 out of 25, which surpasses the best available learning algorithm which obtains 23.81.
Dataset Splits	No	In the Guessing Numbers experiment, we evaluate the performance of 5 approaches: VDN-on-policy, VDN-off-policy, ICP with the random initial map approach (ICN-DIAL-RM), ICP with the delayed map approach (ICN-DIAL-DM), and a cheating approach where a direct communication channel is available (DIAL-Cheat). Each approach is evaluated over 1k episodes with 6 random seeds, and running on a Linux metal machine with 256 GB RAM and 3090Ti GPU for 36 hours. For VDN-off-policy, we begin by warming up the replay buffer until its size exceeds the batch size. During each training step, we add 10 episodes to the replay buffer and randomly sample a batch of episodes from the buffer for training. In contrast, for VDN-on-policy and our proposed method, we utilize a vectorized environment to sample a batch of episodes at each training step and use these samples for training.
Hardware Specification	Yes	Each approach is evaluated over 1k episodes with 6 random seeds, and running on a Linux metal machine with 256 GB RAM and 3090Ti GPU for 36 hours.
Software Dependencies	No	No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow) are provided in the paper.
Experiment Setup	Yes	Specifically, we set the hidden size of the MLP and GRU to 256, use 2 layers in the GRU, and set the learning rate to 5 10 4, batch size to 256. The target network update rate is set to 10, γ is set to 0.99, ϵ is set to 0.1, and we apply gradient clipping with a threshold of 10.