System 1.x: Learning to Balance Fast and Slow Planning with Language Models

Authors: Swarnadeep Saha, Archiki Prasad, Justin Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with two diverse planning tasks Maze Navigation and Blocksworld show that our System-1.x Planner outperforms a System-1 Planner, a System-2 Planner trained to approximate A search, and also a symbolic planner (A search), given a state exploration budget.
Researcher Affiliation Academia Swarnadeep Saha Archiki Prasad Justin Chih-Yao Chen Peter Hase Elias Stengel-Eskin Mohit Bansal UNC Chapel Hill
Pseudocode Yes Algorithm 1 Training Data Generation for System-1.x Controller
Open Source Code Yes Code available at https://github.com/swarna Hub/System-1.x
Open Datasets Yes REPRODUCIBILITY STATEMENT We are making our code and data available in the supplementary material to enable replication of our findings. We randomly generate a balanced dataset of 4K planning problems (split into 3200/400/400 samples) with 5x5 mazes, 40% of the cells containing obstacles, and having optimal plan lengths between 1 to 8. Following the data creation algorithm in Bohnet et al. (2024), we generate problems consisting of 4-7 blocks (without repetition).
Dataset Splits Yes We randomly generate a balanced dataset of 4K planning problems (split into 3200/400/400 samples) with 5x5 mazes... From there, we create a train/validation/test split of 3000/250/200 samples where the train and the validation split consist of samples with plan lengths 1-6 and the test split consists of samples with plan lengths 7-10.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies Yes We choose Mistral-7B-Instruct-v0.2 (Jiang et al., 2023) as the base LLM and fine-tune all our components with Lo RA (Hu et al., 2021) with a rank of 8 for a maximum of 3 epochs and a batch size of 4, resulting in three adapters for System-1, System-2, and the controller.
Experiment Setup Yes We choose Mistral-7B-Instruct-v0.2 (Jiang et al., 2023) as the base LLM and fine-tune all our components with Lo RA (Hu et al., 2021) with a rank of 8 for a maximum of 3 epochs and a batch size of 4, resulting in three adapters for System-1, System-2, and the controller.