Generating Synthetic Data for Unsupervised Federated Learning of Cross-Modal Retrieval

Authors: Tianlong Zhang, Zhe Xue, Adnan Mahmood, Junping Du, Yuchen Dong, Shilong Ou, Lang Feng, Ming-Hsuan Yang, Yuankai Qi

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments using four baselines across three datasets demonstrate that our method performs favorably against state-of-the-art methods. Experiments Datasets and Evaluation Metric For evaluating our proposed method, we select three widely-used datasets: MIRFLICKR (Huiskes and Lew 2008), MS COCO (Lin et al. 2014), and NUS-WIDE (Chua et al. 2009).
Researcher Affiliation Academia 1Beijing University of Posts and Telecommunications 2Macquarie University 3University of California at Merced EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology in prose and includes a main architecture diagram (Figure 1), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes For evaluating our proposed method, we select three widely-used datasets: MIRFLICKR (Huiskes and Lew 2008), MS COCO (Lin et al. 2014), and NUS-WIDE (Chua et al. 2009).
Dataset Splits No The paper mentions using a Dirichlet distribution for non-IID data generation and specifies the total number of samples for NUS-WIDE (over 215,000 sample pairs), but does not explicitly provide training, validation, or test dataset splits (e.g., percentages, specific counts, or references to predefined splits).
Hardware Specification Yes All the experiments are implemented on an RTX A6000 GPU.
Software Dependencies No The paper mentions using pre-trained CNN-F and Bert, and the Adam optimizer, but does not provide specific version numbers for these or any other software dependencies, nor does it specify the programming language or libraries used with versions.
Experiment Setup Yes We use the pre-trained CNN-F (Chatfield et al. 2014) to extract each image s 2,048 dimension feature representation and use Bert (Devlin et 2019) to extract the 2,048 dimension feature representation for each text. We apply Adam optimizer with the batch size 64 and the learning rate of 5 10 5. We apply the Dirichlet distribution to obtain non-IID data, with the parameter β controlling the distribution, where β is set to 0.1 default. The number of the prototype is set to {20, 40, 20} for three datasets separately. The communication rounds are set to {50, 100, 70} for three datasets separately. The local training epoch and global training epoch are all set to 30. Every time the generator is updated, the discriminator will update 6 times. The σ1 and σ2 are temperature coefficients set to 0.5. The adopted values for µ is {0.2, 0.3, 0.2} for three datasets separately in our experiments.