reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gap-Dependent Bounds for Federated $Q$-Learning

Authors: Haochen Zhang, Zhong Zheng, Lingzhou Xue

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments2. All the experiments are conducted in a synthetic environment to demonstrate the log T-type regret and reduced communication cost bound with the coefficient of the main term O(log T) being independent of M, S, A in Fed Q-Hoeffding algorithm (Zheng et al., 2024).
Researcher Affiliation	Academia	1Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA. Correspondence to: Lingzhou Xue <EMAIL>.
Pseudocode	Yes	Details are provided in Algorithm 1 and Algorithm 2 in Appendix C.1. Algorithm 1 Fed Q-Hoeffding (Central Server) Algorithm 2 Fed Q-Hoeffding (Agent m in round k)
Open Source Code	Yes	The code for the numerical experiments is included in the supplementary materials along with the submission.
Open Datasets	No	All the experiments are conducted in a synthetic environment to demonstrate the log T-type regret and reduced communication cost bound with the coefficient of the main term O(log T) being independent of M, S, A in Fed Q-Hoeffding algorithm (Zheng et al., 2024). We follow Zheng et al. (2024) and generate a synthetic environment to evaluate the proposed algorithms on a tabular episodic MDP.
Dataset Splits	No	The paper describes generating data within a synthetic environment and conducting numerical experiments. It mentions generating '107 episodes for each agent'. However, it does not specify any dataset splits (e.g., training, testing, validation percentages or counts) for a predefined or externally available dataset, as it creates its own experimental data.
Hardware Specification	Yes	All the experiments are run on a server with Intel Xeon E5-2650v4 (2.2GHz) and 100 cores. Each replication is limited to a single core and 50GB RAM.
Software Dependencies	No	The paper refers to algorithms like 'Fed Q-Hoeffding algorithm (Zheng et al., 2024)' and 'UCB-Hoeffding algorithm (Jin et al., 2018)'. While these are specific algorithms, the paper does not mention any software packages, libraries, or programming languages with specific version numbers used for their implementation. For example, it does not state 'Python 3.x' or 'PyTorch 1.x'.
Experiment Setup	No	The paper mentions setting 'the constant c in the bonus term bt to be 2 and ι = 1' for the synthetic environment. However, it does not provide a comprehensive list of hyperparameters, optimizer settings, or other detailed configuration parameters typically found in an 'experimental setup' section for reproducibility. For instance, learning rates, batch sizes, or specific initialization strategies are not detailed.