System 1.x: Learning to Balance Fast and Slow Planning with Language Models
Authors: Swarnadeep Saha, Archiki Prasad, Justin Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with two diverse planning tasks Maze Navigation and Blocksworld show that our System-1.x Planner outperforms a System-1 Planner, a System-2 Planner trained to approximate A search, and also a symbolic planner (A search), given a state exploration budget. |
| Researcher Affiliation | Academia | Swarnadeep Saha Archiki Prasad Justin Chih-Yao Chen Peter Hase Elias Stengel-Eskin Mohit Bansal UNC Chapel Hill |
| Pseudocode | Yes | Algorithm 1 Training Data Generation for System-1.x Controller |
| Open Source Code | Yes | Code available at https://github.com/swarna Hub/System-1.x |
| Open Datasets | Yes | REPRODUCIBILITY STATEMENT We are making our code and data available in the supplementary material to enable replication of our findings. We randomly generate a balanced dataset of 4K planning problems (split into 3200/400/400 samples) with 5x5 mazes, 40% of the cells containing obstacles, and having optimal plan lengths between 1 to 8. Following the data creation algorithm in Bohnet et al. (2024), we generate problems consisting of 4-7 blocks (without repetition). |
| Dataset Splits | Yes | We randomly generate a balanced dataset of 4K planning problems (split into 3200/400/400 samples) with 5x5 mazes... From there, we create a train/validation/test split of 3000/250/200 samples where the train and the validation split consist of samples with plan lengths 1-6 and the test split consists of samples with plan lengths 7-10. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | Yes | We choose Mistral-7B-Instruct-v0.2 (Jiang et al., 2023) as the base LLM and fine-tune all our components with Lo RA (Hu et al., 2021) with a rank of 8 for a maximum of 3 epochs and a batch size of 4, resulting in three adapters for System-1, System-2, and the controller. |
| Experiment Setup | Yes | We choose Mistral-7B-Instruct-v0.2 (Jiang et al., 2023) as the base LLM and fine-tune all our components with Lo RA (Hu et al., 2021) with a rank of 8 for a maximum of 3 epochs and a batch size of 4, resulting in three adapters for System-1, System-2, and the controller. |