Referential communication in heterogeneous communities of pre-trained visual deep networks
Authors: Matéo Mahaut, Roberto Dessi, Francesca Franzon, Marco Baroni
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | After reviewing related work (Section 2) and presenting our general setup (Section 3), we delve into our experiments in Section 4. First, in Section 4.1, we show that it is indeed possible for sets of heterogeneous pre-trained networks to successfully converge on a referent through an induced communication protocol. In Section 4.2, we study referential generalization, showing that the developed protocol is sufficiently flexible that the networks can use it to refer to objects that were not seen during the training phase... Tables 3 and 4 show that communication is at least partially successful at a more granular level than Image Net1k classes... Table 5 shows that 64-dimensional communication is still possible in this zero-shot dataset-transfer experiment... |
| Researcher Affiliation | Academia | Matéo Mahaut EMAIL Francesca Franzon EMAIL Roberto Dessì EMAIL Universitat Pompeu Fabra Marco Baroni EMAIL Universitat Pompeu Fabra and ICREA |
| Pseudocode | Yes | Pseudocode for the referential game is presented in Appendix A. A The one-to-one referential game in pseudocode |
| Open Source Code | Yes | 1https://github.com/facebookresearch/EGG, scripts from our experiments are in ./egg/zoo/pop |
| Open Datasets | Yes | As nearly all agents rely on vision modules pre-trained on the ILSVRC2012 training set, we sample images from the validation data of that same dataset (50,000 images)... Thus, our Image Net1k communication training and testing sets are both extracted from the original ILSVRC2012 validation set. Agents are also tested on an out-of-domain (OOD) dataset containing classes from the larger Image Net21k repository, as pre-processed by Ridnik et al. (2021)... We further test the agents that were communication-trained on Image Net1k on 3 different datasets... 1) Cifar100 (Krizhevsky et al., 2009)... 2) Places205 (Zhou et al., 2014)... 3) Celeb A: (Liu et al., 2015)... We provide scripts to reproduce our Image Net1k and OOD datasets at https://github.com/mahautm/emecom_pop_data |
| Dataset Splits | Yes | we sample images from the validation data of that same dataset (50,000 images) to teach them to play the referential game, while reserving 10% of those images for testing (note that we do not use image annotations). Thus, our Image Net1k communication training and testing sets are both extracted from the original ILSVRC2012 validation set. We used 90% of the OOD data for testing OOD communication accuracy (Section 4.2 below) and to train the classifiers in the experiments reported in Appendix H. The remaining 10% was used to test the latter classifiers. Batch size is set at 64, the largest value we could robustly fit in GPU memory. As we sample distractors directly from training batches, on each training step the referential game is played 64 times, once with every different image in the batch as target and the other 63 images serving as distractors. |
| Hardware Specification | Yes | Each experiments was conducted using a single NVIDIA A30 GPU. |
| Software Dependencies | No | All experiments are implemented using the EGG toolkit (Kharitonov et al., 2019). Our version of CLIP uses the Vi T architecture for image encoding (we use the Vi T based clip model from the "Py Torch Image Models" library (Wightman, 2019)... |
| Experiment Setup | Yes | Batch size is set at 64, the largest value we could robustly fit in GPU memory... All parameters needed to reproduce our experiments with the toolkit (both the continuous setup reported here, and the discrete one discussed in Appendix C) can be found in Appendix E. Table 13: Hyperparameters for training continuous communication channels. Hyperparameters batch size 64 optimizer Adam learning rate 1e-4 max message length 1 non linearity sigmoid vocab size 16 / 64 receiver hidden dimension 2048 image size 384 receiver cosine temperature 0.1 |