Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Authors: Jinghan Li, Zhicheng Sun, Yadong Mu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is evaluated on the Virtual Home-Env benchmark, showing advanced performance with improved scaling w.r.t. inference-time computation. Code is available at https: //github.com/Singularity0104/ equilibrium-planner. [...] 4. Experiments [...] Table 1: Performance on Virtual Home-Env without correction. Our planner achieves state-of-the-art performance in most evaluations. |
| Researcher Affiliation | Academia | 1Peking Unviersity, China. Correspondence to: Yadong Mu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Inference of Equilibrium Planner |
| Open Source Code | Yes | Code is available at https: //github.com/Singularity0104/ equilibrium-planner. |
| Open Datasets | Yes | Our method is evaluated on the Virtual Home-Env benchmark (Puig et al., 2018; Liao et al., 2019), demonstrating its advantageous performance with better scaling w.r.t. inference computation than tree-based alternatives. |
| Dataset Splits | Yes | We randomly divide the Virtual Home-Env dataset into training set and test set in a 50:50 ratio. To analyze the generalizability of our method, we mainly study the following three subsets of the test set: novel scene set, novel task set, and novel scene and task set. Overall, the dataset contains 735 training trajectories, 468 trajectories within the novel task set, 95 trajectories within the novel scene set, 62 trajectories within the novel scene and task set. |
| Hardware Specification | No | The paper discusses 'Inference TFLOPS' and 'KV cache' for speeding up inference in Figure 5a and section B.3 respectively, but it does not specify any particular hardware components like GPU models (e.g., NVIDIA A100, RTX 2080 Ti) or CPU models used for the experiments. |
| Software Dependencies | No | Our implementation is consistent with the baseline methods by finetuning from Llama 3 8B (Dubey et al., 2024). The paper mentions the specific LLM (Llama 3 8B) used but does not provide specific version numbers for ancillary software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The equilibrium planner is finetuned for 6 iterations with a learning rate of 0.0002. [...] For the world model, we collect all interacting experiences between the planner and the environment, including plans and feedback, and finetune it for 5 epochs using the same learning rate of 0.0002. [...] A greedy LLM sampling strategy is used in later refinement steps until convergence. [...] The ratio of environmental interactions to world model calls is currently set to 1:1 |