Federated Binary Matrix Factorization Using Proximal Optimization

Authors: Sebastian Dalleiger, Jilles Vreeken, Michael Kamp

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive empirical evaluation shows that our algorithm outperforms, in quality and efficacy, federation schemes of state-of-the-art BMF methods on a diverse set of real-world and synthetic data.
Researcher Affiliation Academia 1KTH Royal Institute of Technology 2CISPA Helmholtz Center for Information Security 3Institute for AI in Medicine, UK Essen and Ruhr University Bochum, Monash University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Federated Binary Matrix Factorization with FELB
Open Source Code Yes We provide the source code, datasets, synthetic dataset generator,1 and additional information regarding reproducibility 1https://doi.org/10.5281/zenodo.14501661 in Apx. E.
Open Datasets Yes We include Goodreads (Kotkov et al. 2022) for books and Movielens (Harper and Konstan 2015) and Netflix (Netflix, Inc. 2009) for movies, where user ratings 3.5 are binarized to 1. In life sciences, we use TCGA (Institute 2005) for cancer genomics, HPA (Bakken et al. 2021; Sj ostedt, Zhong, and et. al 2020) for single-cell proteomics, and Genomics (Oleksyk, Gonc alo, and et. al 2015) for mutation data. [...] For social science, we analyze poverty (Pov) and income (Inc) using the ACS (U.S. Census Bureau 2023) dataset, binarizing with one-hot encoding utilizing Folktables (Ding et al. 2021). In natural language processing, we study higher-order word co-occurrences in Ar Xiv cs.LG abstracts (Collaboration 2023).
Dataset Splits Yes To create data scarcity, we fix the dataset size to 216 and increase the number of clients from 22 to 29, thus iteratively reducing the sample count per client. [...] To evaluate under data abundance, we scale the number of samples by increasing the number of clients from 22 to 29, maintaining a constant sample count of 500 per client.
Hardware Specification Yes We implement FELB in the Julia language and run experiments on 32 CPU Cores of an AMD EPYC 7702 or one NVIDIA A40 GPU, reporting wall-clock time in seconds.
Software Dependencies No The paper mentions 'We implement FELB in the Julia language' but does not specify any version numbers for Julia or any other software libraries.
Experiment Setup Yes In all experiments, we limit each algorithm run to 12h in total. [...] Employing a fixed number of 10 clients, we applied federated ASSO, GRECOND, MEBF, ELBMF, and ZHANG, alongside FELB and FELBMU to each dataset. [...] To create data scarcity, we fix the dataset size to 216 and increase the number of clients from 22 to 29. [...] To evaluate under data abundance, we scale the number of samples by increasing the number of clients from 22 to 29, maintaining a constant sample count of 500 per client. [...] We limit the federation to a reasonable C= 50 clients, on which we compare federated methods [...] synchronizing after every b= 10 local optimization rounds.