reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Truncated Gaussian Policy for Debiased Continuous Control

Authors: Ganghun Lee, Minji Kim, Minsu Lee, Byoung-Tak Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical studies and comparisons on various continuous control tasks demonstrate that the truncated Gaussian policies significantly reduce the rate of boundary action usage, while scale-adjusted ones successfully balance the bias and counter-bias. It generally outperforms the Gaussian policy and shows competitive results compared to other approaches designed to counteract the bias.
Researcher Affiliation	Academia	1Interdisciplinary Program in Artificial Intelligence, Seoul National University 2Department of Computer Science, Seoul National University 3AIIS, Seoul National University 4School of AI Convergence, Sungshin Women s University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes mathematical formulations and textual explanations of the methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We conduct experiments on eight continuous control tasks from Mu Jo Co (Todorov, Erez, and Tassa 2012), including six locomotion tasks (Half Cheetah, Walker2d, Ant, Hopper, Humanoid, and Swimmer) and two manipulation tasks (Pusher and Reacher), with version v4. For high-dimensional action space experiments, we use two locomotion tasks from Humanoid Bench (h1hand-walk, h1hand-reach) (61 dimensions) (Sferrazza et al. 2024) and two from the Deep Mind Control Suite (DMC) (dog-walk, dog-run) (Tunyasuvunakool et al. 2020) (38 dimensions).
Dataset Splits	No	The paper states: "Each Mu Jo Co result is averaged over 10 seeds, and Humanoid Bench and Deep Mind results are averaged over 5 seeds." This describes the aggregation of results but does not specify training/test/validation splits for the datasets themselves. The environments (Mu Jo Co, Humanoid Bench, DMC) imply their own standard setups, but explicit split details are not provided.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models) used for running the experiments.
Software Dependencies	No	The paper mentions: "We refer to Clear RL (Huang et al. 2022) for the standard PPO setup." However, it does not specify version numbers for Clear RL or any other software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We use k = 2, dmin = 0.01 for our SA-TGaussian. The initial scale for all methods is set to σinit = 0.5, which is a learnable parameter and independent of states, as in standard PPO setup. (GEntmax) (using coefficient 0.01.