reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints

Authors: Kazumi Kasaura

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove the soundness of our approach and show experimentally that the proposed method outperforms existing methods on several planning tasks, including path planning for agents with complex kinematics and motion planning for multi-degree-of-freedom robot arms. Keywords: Path Planning, Finsler Manifold, Riemannian Manifold, All-Pairs Shortest Paths, Sub-Goal [...] We experimentally compared our proposed method, on ﬁve path (or motion) planning tasks, to sequential generation with goal-conditioned reinforcement learning and midpoint tree generation trained by a policy gradient method without a critic. Two tasks involved continuous metrics or constraints (local planning), while the other three involved collision avoidance (global planning).
Researcher Affiliation	Industry	Kazumi Kasaura EMAIL OMRON SINIC X Corporation Nagase Hongo Building 3F 5-24-5 Hongo, Bunkyo-ku Tokyo-to, Japan 113-0033
Pseudocode	Yes	Algorithm 1 Actor-Critic Midpoint Learning
Open Source Code	Yes	The codes used in the experiments for this paper are available at https://github.com/omron-sinicx/midpoint_learning.
Open Datasets	No	The paper defines various environments (e.g., Matsumoto Metric, Unidirectional Car-Like Constraints, 2D Domain with Obstacles, 7-DoF Robotic Arm with an Obstacle, Three Agents in the Plane) for its experiments. Within these environments, 100 pairs of points are randomly generated for evaluation. The paper does not provide links, DOIs, or citations to external, pre-existing publicly available datasets, but rather describes setting up custom environments for data generation during the learning process itself.
Dataset Splits	No	For each environment, we randomly generated 100 pairs of points from the free space, before the experiment. During training, we evaluated the models regularly by solving the tasks for the prepared pairs and recorded the success rates.
Hardware Specification	Yes	The GPUs we used were NVIDIA RTX A5500 for experiments in 5.4.3 and 5.4.4 and NVIDIA RTX A6000 for experiments in 5.4.1, 5.4.2, and 5.4.5.
Software Dependencies	No	The environments were implemented by Num Py and Py Torch (Paszke et al., 2019). To implement of the robotic arm environment, we used Py Torch Kinematics (Zhong et al., 2023). We also used Robotics Toolbox for Python (Corke and Haviland, 2021) for visualization of the robotic arm motion. We used PPO implemented in Stable Baselines3 (Raﬃn et al., 2019), which uses Py Torch. We modiﬁed the implementation of subgoal-tree policy gradient (SGT-PG) by the authors, available at https://github.com/tomjur/SGT-PG, which uses Tensor Flow (Abadi et al., 2016). The paper mentions several software packages like PyTorch, NumPy, Stable Baselines3, and TensorFlow, but it does not specify concrete version numbers for any of these dependencies.
Experiment Setup	Yes	We set the number of epochs Nepochs to 10 and the batch size to 256. The actor network outputs a Gaussian distribution with a diagonal covariant matrix on the state representation space. Both the actor and critic networks are multilayer perceptrons. The hidden layers were two of size 64 for 5.4.1 and three of sizes 400, 300, 300 for the other environments. Re LU was selected as the activation function... Adam (Kingma and Ba, 2014) was used as the optimizer. The learning rate was tuned to 3e-5 for 5.4.1 and to 1e-6 for other environments. [...] For this environment, we set the number of segments n = 64 (Dmax = 6), the proximity threshold ε = 0.1, and the total number of timesteps T = 2e7.