reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unsupervised Translation of Emergent Communication

Authors: Ido Levy, Orr Paradise, Boaz Carmeli, Ron Meir, Shafi Goldwasser, Yonatan Belinkov

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our findings demonstrate UNMT s potential to translate EC, illustrating that task complexity characterized by semantic diversity enhances EC translatability, while higher task complexity with constrained semantic variability exhibits pragmatic EC, which, although challenging to interpret, remains suitable for translation. Our experiments support the potential of UNMT to translate AI-generated languages into human-readable text. Notably, the Inter-category setting showcased superior translation quality, evidenced by higher BLEU and METEOR scores. Furthermore, our analysis indicates that greater vocabulary usage, reflecting a broader range of vocabulary in EC messages, and increased entropy, signaling message unpredictability, may pose challenges to translation accuracy. Qualitative analysis of the resulting translations suggests that UNMT successfully captures the main objects or themes in the described images, but not all the fine-grained details.
Researcher Affiliation	Academia	1Technion Israel Institute of Technology, Haifa, Israel 2University of California, Berkeley, CA, USA EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the UNMT Architecture in three steps: 1. Pre-training, 2. Fine-tuning, and 3. Back-translation and Denoising. These steps are described in paragraph form, not as structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using the EGG framework and the string2string package, which are third-party tools. There is no explicit statement or link provided for the authors' own source code for the methodology described in the paper.
Open Datasets	Yes	For this study, we employed the MSCOCO dataset (Lin et al. 2014), a diverse collection of 117K complex images that are annotated with various NL concepts. Each image in the dataset is paired with five distinct captions, providing high-quality captions to inject prior knowledge of image descriptions while training our UNMT models, and to serve as valuable reference points for evaluating translation performance.
Dataset Splits	No	The paper states: 'The same target images were used for each complexity s test, but a new set of distractors was sampled according to the complexity of the game.' While it mentions a 'test' set, it does not provide specific information regarding the overall training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit standard splits).
Hardware Specification	No	The paper mentions 'AI agents were configured with a hybrid architecture combining elements of LSTMs... and Res Nets...' and that 'The Res Net was initialized with pre-trained weights from Image Net'. However, it does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions 'We used the EGG framework (Kharitonov et al. 2019)' and 'The UNMT model utilized a pre-trained XLM'. It also states 'Metrics were calculated using the string2string package (Suzgun, Shieber, and Jurafsky 2023)'. However, it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We used the EGG framework (Kharitonov et al. 2019) to train the models with a batch size of 1024 and an initial learning rate of 0.001, using the Adam optimizer. Training epochs were set to 50, with early stopping based on validation loss. In each game, nine distractors are sampled based on the complexity policy. The agents communication channel is quantized (Carmeli, Meir, and Belinkov 2023), consisting of 64 symbols, represented as binary vectors of length 6, with each message composed of 6 symbols and EOS symbol. To ensure robustness, each referential game was run with 5 different random seeds. The UNMT model utilized a pre-trained XLM, which was fine-tuned on both the EC corpus and MSCOCO captions. More details on the EN corpus are provided in Appendix A. See Appendix G for full hyperparameters.