reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes

Authors: Haotian Wu, Gongpu Chen, Deniz Gunduz

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results validate Act2Comm s capability to enable reliable communication while maintaining a certain level of control performance. We evaluate Act2Comm across three distinct MDP environments, as detailed in Appendix D, with communication performance measured by the bit error rate (BER).
Researcher Affiliation	Academia	Haotian Wu , Gongpu Chen , Deniz G und uz Department of Electrical and Electronic Engineering Imperial College London, London SW7 2AZ, U.K. EMAIL
Pseudocode	Yes	To enhance readers understanding of the training and inference process, we provide the pseudocode and illustration figures for (see Fig. 7) Act2Comm, detailing both the training and inference phases, as shown in Algorithms 1 and 2.
Open Source Code	Yes	For additional details about the training, the training logs and source code are also available on the project page of this paper.
Open Datasets	No	The paper describes custom MDP environments: "Lucky Wheel", "Catch the Ball", and "Erratic Robot". It does not reference any publicly available datasets or provide access to these environments as datasets. For example, "The Catch the Ball game is set in a 3x3 grid..." and "The Erratic robot game takes place on a 4x4 grid map..." are descriptions of simulation setups, not external datasets.
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits. Instead, it describes simulations run on custom MDP environments and mentions that "The performance presented is averaged over 20,000 execution times," which implies repeated simulations rather than fixed dataset splits.
Hardware Specification	Yes	The experimental results, presented in Table 4, were obtained using a single GPU-A5000 with 10,000 runs for the Erratic Robot environment.
Software Dependencies	No	The paper mentions "an Adam-based lookahead optimizer (Zhang et al., 2019)" but does not specify version numbers for any programming languages, libraries, or solvers used for the implementation.
Experiment Setup	Yes	For the Act2Comm scheme, we train the model with a batch size of 4096, a learning rate of 0.001, and an Adam-based lookahead optimizer (Zhang et al., 2019). The inner-training for the critic network consists of sin = 20 steps, with a noise variance of σ2 w = 0.1. Each block has a length of µ = 3, and temperature parameter is as γ = 10, γ = 50, γ = 100, γ = 200. The performance presented is averaged over 20,000 execution times. To investigate the trade-offs, we train the Act2Comm model with λ ∈ [0.01, 20]. The detailed architecture of the Act2Comm scheme is provided in Fig. 8b. ...we set d = 32, Lt = 2 and Lt = 4 during the experiments.