Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints
Authors: Kazumi Kasaura
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove the soundness of our approach and show experimentally that the proposed method outperforms existing methods on several planning tasks, including path planning for agents with complex kinematics and motion planning for multi-degree-of-freedom robot arms. Keywords: Path Planning, Finsler Manifold, Riemannian Manifold, All-Pairs Shortest Paths, Sub-Goal [...] We experimentally compared our proposed method, on five path (or motion) planning tasks, to sequential generation with goal-conditioned reinforcement learning and midpoint tree generation trained by a policy gradient method without a critic. Two tasks involved continuous metrics or constraints (local planning), while the other three involved collision avoidance (global planning). |
| Researcher Affiliation | Industry | Kazumi Kasaura EMAIL OMRON SINIC X Corporation Nagase Hongo Building 3F 5-24-5 Hongo, Bunkyo-ku Tokyo-to, Japan 113-0033 |
| Pseudocode | Yes | Algorithm 1 Actor-Critic Midpoint Learning |
| Open Source Code | Yes | The codes used in the experiments for this paper are available at https://github.com/omron-sinicx/midpoint_learning. |
| Open Datasets | No | The paper defines various environments (e.g., Matsumoto Metric, Unidirectional Car-Like Constraints, 2D Domain with Obstacles, 7-DoF Robotic Arm with an Obstacle, Three Agents in the Plane) for its experiments. Within these environments, 100 pairs of points are randomly generated for evaluation. The paper does not provide links, DOIs, or citations to external, pre-existing publicly available datasets, but rather describes setting up custom environments for data generation during the learning process itself. |
| Dataset Splits | No | For each environment, we randomly generated 100 pairs of points from the free space, before the experiment. During training, we evaluated the models regularly by solving the tasks for the prepared pairs and recorded the success rates. |
| Hardware Specification | Yes | The GPUs we used were NVIDIA RTX A5500 for experiments in 5.4.3 and 5.4.4 and NVIDIA RTX A6000 for experiments in 5.4.1, 5.4.2, and 5.4.5. |
| Software Dependencies | No | The environments were implemented by Num Py and Py Torch (Paszke et al., 2019). To implement of the robotic arm environment, we used Py Torch Kinematics (Zhong et al., 2023). We also used Robotics Toolbox for Python (Corke and Haviland, 2021) for visualization of the robotic arm motion. We used PPO implemented in Stable Baselines3 (Raffin et al., 2019), which uses Py Torch. We modified the implementation of subgoal-tree policy gradient (SGT-PG) by the authors, available at https://github.com/tomjur/SGT-PG, which uses Tensor Flow (Abadi et al., 2016). The paper mentions several software packages like PyTorch, NumPy, Stable Baselines3, and TensorFlow, but it does not specify concrete version numbers for any of these dependencies. |
| Experiment Setup | Yes | We set the number of epochs Nepochs to 10 and the batch size to 256. The actor network outputs a Gaussian distribution with a diagonal covariant matrix on the state representation space. Both the actor and critic networks are multilayer perceptrons. The hidden layers were two of size 64 for 5.4.1 and three of sizes 400, 300, 300 for the other environments. Re LU was selected as the activation function... Adam (Kingma and Ba, 2014) was used as the optimizer. The learning rate was tuned to 3e-5 for 5.4.1 and to 1e-6 for other environments. [...] For this environment, we set the number of segments n = 64 (Dmax = 6), the proximity threshold ε = 0.1, and the total number of timesteps T = 2e7. |