Creating Coherence in Federated Non-Negative Matrix Factorization

Authors: Sebastian Dalleiger, Aristides Gionis

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a diverse set of real-world and synthetic datasets, we demonstrate the effectiveness of our methods. 4 Experiments Having introduced our algorithms, we now systematically evaluate their practical performance. We implemented our methods in the Julia programming language and ran experiments on 16 cores of an AMD EPYC 7702 CPU and a single NVIDIA A40 GPU, reporting wall-clock time.
Researcher Affiliation Academia Sebastian Dalleiger, Aristides Gionis KTH Royal Institute of Technology, Stockhom, Sweden EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Fixed-point Barycenter Algorithm 2: FNMF: Federated NMF
Open Source Code Yes Replication Material doi.org/10.5281/zenodo.14501532 We provide the source code, datasets, synthetic dataset generator, hyperparameters, and other information needed for reproducibility.1 https://doi.org/10.5281/zenodo.14501532
Open Datasets Yes Replication Material doi.org/10.5281/zenodo.14501532 We provide the source code, datasets, synthetic dataset generator, hyperparameters, and other information needed for reproducibility.1 https://doi.org/10.5281/zenodo.14501532 Goodreads (Kotkov et al. 2022) (user books), Netflix Prize (Netflix, Inc. 2009), and Movielens 25M (Harper and Konstan 2015) (user movies). For the biomedical domain, we use TCGA (National Cancer Institute 2005) cancer gene expressions, as well as use single-cell brain protein data HPA from the Human Protein Atlas (Sj ostedt, Zhong, and et. al 2020). To cover computer vision, we use MNIST handwritten digits (Le Cun et al. 1998), Fashion MNIST clothing images (Xiao, Rasul, and Vollgraf 2017), and CIFAR-10 tiny images (Krizhevsky and Hinton 2009). In the area of natural language processing, we extract 200 000 random rows from the tf-idf matrix for all stopword-free, and lemmatized Ar Xiv abstracts (Ar Xiv.org Collaborations 2024).
Dataset Splits No The paper describes generating synthetic data and distributing data to clients (e.g., "distribute a fixed amount of data (left) and proportionally-growing data (right) to an increasing number of clients"). For real-world datasets, it mentions using them with 50 clients. However, it does not explicitly provide details on how these datasets were split into training, validation, or test sets with specific percentages, sample counts, or methodologies for reproduction.
Hardware Specification Yes We implemented our methods in the Julia programming language and ran experiments on 16 cores of an AMD EPYC 7702 CPU and a single NVIDIA A40 GPU, reporting wall-clock time.
Software Dependencies No The paper mentions that methods were implemented in the "Julia programming language" but does not provide version numbers for Julia or any other specific software libraries, frameworks, or solvers used in the implementation.
Experiment Setup No The paper states: "We provide the source code, datasets, synthetic dataset generator, hyperparameters, and other information needed for reproducibility." (Replication Material doi.org/10.5281/zenodo.14501532). However, it does not explicitly list concrete hyperparameter values, training configurations, or system-level settings within the main text of the paper.