FedTMOS: Efficient One-Shot Federated Learning with Tsetlin Machine
Authors: Shannon How, Jagmohan Chauhan, Geoff Merrett, Jonathon Hare
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate that Fed TMOS significantly outperforms its ensemble counterpart by an average of 6.16%, and the leading state-of-the-art OFL baselines by 7.22% across various OFL settings. |
| Researcher Affiliation | Academia | Shannon How Shi Qi, Jagmohan Chauhan, Geoff V Merrett, Jonathan Hare University of Southampton, UK EMAIL,EMAIL |
| Pseudocode | Yes | The overall algorithm is summarized in Algorithm 1 Algorithm 1 Fed TMOS Algorithm 2 reassign weights(cluster info, cluster means, ϕ) Algorithm 3 average models(final models) |
| Open Source Code | Yes | The code is available at: https://github.com/shannonhsq/FedTMOS. |
| Open Datasets | Yes | Datasets. We evaluated our approach on four image datasets widely utilized in FL literature: MNIST (Deng, 2012), F-MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky, 2009). |
| Dataset Splits | Yes | To simulate heterogeneity, we applied two different methods: sampling class priors from a Dirichlet distribution, Dir(α), as described in Hsu et al. (2019), where α controls the degree of heterogeneity in data splits. We also distributed data such that each client possesses samples from only β classes, S(β). In our experiments, we simulated non-IID settings by using α = 0.05, 0.1, 0.3 and β = 2, 3, 4. ... Following Diao et al. (2023), half of the test dataset served as a public dataset for Distilled Fed OV, and we evaluated all algorithms on the same subset, ensuring consistency across evaluations. |
| Hardware Specification | No | We evaluated the average latency for model aggregation at the server using a standard compute node equipped with a single GPU core. ... We computed the average training latency of the models on each client with a compute node with 32 CPU cores. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers. |
| Experiment Setup | Yes | For CNN-based algorithms, we used a 5-layer CNN architecture with a batch size of 128, as per Dai et al. (2024) and Jhunjhunwala et al.. For Fed TMOS, we fixed k = 30 for the Dir partitions to account for class distribution variability among clients and k = 10 for the S partition. ... All algorithms were trained for 30 local epochs, as outlined in Jhunjhunwala et al.. ... For Fed TMOS, we used the following model configurations of CTM for each dataset: Table 4: Fed TMOS Model Configuration MNIST F-MNIST SVHN CIFAR-10 Adaptive Threshold 3x3 CT 4x4 CT Number of Clauses 100 200 1100 200 200 200 Feedback Threshold 1000 1000 2000 400 300 300 Learning Sensitivity 5 5 5 5 5 5 Patch Dimensions (10,10) (5,5) (5,5) (10,10) (3,3) (4,4) ... For Fed TMOS, as we wanted to constrain our models such that the final server model aligns with smaller or equal sizes to the CNN counterparts, we set various ϕ values to constrain our final model. For the MNIST, F-MNIST, SVHN and CIFAR-10 dataset we used : ϕ {4, 3, 3, (3, 3, 1)} respectively. ... We used threshold = {0.5, 0.5, 0.3, 0.6} respectively for the MNIST, F-MNIST, SVHN and CIFAR10 dataset. |