AssistanceZero: Scalably Solving Assistance Games
Authors: Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Assistance Zero outperforms model-free RL algorithms and imitation learning in the Minecraft-based assistance game. In a human study, our Assistance Zero-trained assistant significantly reduces the number of actions participants take to complete building tasks in Minecraft. |
| Researcher Affiliation | Academia | 1University of California, Berkeley, CA, USA. Correspondence to: Cassidy Laidlaw <cassidy EMAIL>. |
| Pseudocode | No | The paper provides detailed descriptions of the MCTS and training procedures in Appendix A, including formulas and step-by-step explanations, but does not present these in a distinct block explicitly labeled as "Pseudocode", "Algorithm", or a code-like formatted procedure. |
| Open Source Code | Yes | Our code and models are available at https: //github.com/cassidylaidlaw/ minecraft-building-assistance-game. |
| Open Datasets | Yes | At the start of an episode, the goal is sampled from a dataset of houses based on the Craft Assist dataset (Gray et al., 2019). |
| Dataset Splits | No | We maintain separate train and test datasets to evaluate generalization. At the beginning of each training episode, a goal structure θ is randomly sampled from the training dataset Dtrain. We collect 18 episodes in MBAG of five human subjects building houses randomly selected from Dtrain. We randomly sample a unique goal structure for each participant from our test set Dtest. All training uses houses from the train set Dtrain, while all training uses houses from the train set Dtrain; thus, we always test human models and assistants on unseen goal structures. However, specific percentages or sample counts for these train/test splits are not provided. |
| Hardware Specification | Yes | When evaluating Assistance Zero assistants, we use only 20 simulations of MCTS, which is roughly the number that can run in real-time with Minecraft on an NVIDIA GeForce 1080 Ti GPU. |
| Software Dependencies | No | We implement all RL and imitation learning algorithms in RLlib (Liang et al., 2018) and Py Torch (Paszke et al., 2019). The paper mentions the software used (RLlib and Py Torch) but does not specify their version numbers. |
| Experiment Setup | Yes | Hyperparameters for BC human models (Table 8): Epochs 30, Dropout 0.7, SGD batch size 128, Learning rate 10 3. Hyperparameters for PPO human model training (Table 11): Training iterations 100, Rollout length 500, Number of environments 640, SGD batch size 512, Learning rate 3 10 4. Hyperparameters for PPO assistant training (Table 12): Training iterations 300, Rollout length 64, Number of environments 256, SGD minibatch size 256, Learning rate 3 10 4. Assistance Zero hyperparameters for MBAG (Table 14): Training iterations 500, Rollout length per iteration per environment 64, Number of environments 256, Replay buffer size 262,144, SGD batch size 256, Learning rate 10 3. |