reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FedTMOS: Efficient One-Shot Federated Learning with Tsetlin Machine

Authors: Shannon How, Jagmohan Chauhan, Geoff Merrett, Jonathon Hare

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate that Fed TMOS significantly outperforms its ensemble counterpart by an average of 6.16%, and the leading state-of-the-art OFL baselines by 7.22% across various OFL settings.
Researcher Affiliation	Academia	Shannon How Shi Qi, Jagmohan Chauhan, Geoff V Merrett, Jonathan Hare University of Southampton, UK EMAIL,EMAIL
Pseudocode	Yes	The overall algorithm is summarized in Algorithm 1 Algorithm 1 Fed TMOS Algorithm 2 reassign weights(cluster info, cluster means, ϕ) Algorithm 3 average models(final models)
Open Source Code	Yes	The code is available at: https://github.com/shannonhsq/FedTMOS.
Open Datasets	Yes	Datasets. We evaluated our approach on four image datasets widely utilized in FL literature: MNIST (Deng, 2012), F-MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky, 2009).
Dataset Splits	Yes	To simulate heterogeneity, we applied two different methods: sampling class priors from a Dirichlet distribution, Dir(α), as described in Hsu et al. (2019), where α controls the degree of heterogeneity in data splits. We also distributed data such that each client possesses samples from only β classes, S(β). In our experiments, we simulated non-IID settings by using α = 0.05, 0.1, 0.3 and β = 2, 3, 4. ... Following Diao et al. (2023), half of the test dataset served as a public dataset for Distilled Fed OV, and we evaluated all algorithms on the same subset, ensuring consistency across evaluations.
Hardware Specification	No	We evaluated the average latency for model aggregation at the server using a standard compute node equipped with a single GPU core. ... We computed the average training latency of the models on each client with a compute node with 32 CPU cores.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers.
Experiment Setup	Yes	For CNN-based algorithms, we used a 5-layer CNN architecture with a batch size of 128, as per Dai et al. (2024) and Jhunjhunwala et al.. For Fed TMOS, we fixed k = 30 for the Dir partitions to account for class distribution variability among clients and k = 10 for the S partition. ... All algorithms were trained for 30 local epochs, as outlined in Jhunjhunwala et al.. ... For Fed TMOS, we used the following model configurations of CTM for each dataset: Table 4: Fed TMOS Model Configuration MNIST F-MNIST SVHN CIFAR-10 Adaptive Threshold 3x3 CT 4x4 CT Number of Clauses 100 200 1100 200 200 200 Feedback Threshold 1000 1000 2000 400 300 300 Learning Sensitivity 5 5 5 5 5 5 Patch Dimensions (10,10) (5,5) (5,5) (10,10) (3,3) (4,4) ... For Fed TMOS, as we wanted to constrain our models such that the final server model aligns with smaller or equal sizes to the CNN counterparts, we set various ϕ values to constrain our final model. For the MNIST, F-MNIST, SVHN and CIFAR-10 dataset we used : ϕ {4, 3, 3, (3, 3, 1)} respectively. ... We used threshold = {0.5, 0.5, 0.3, 0.6} respectively for the MNIST, F-MNIST, SVHN and CIFAR10 dataset.