CraftFactory: A Conditioned Control Policy Benchmark for Compositional Generalization

Authors: Jinbing Hou, Youpeng Zhao, Jian Zhao

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this gap, we propose Craft Factory, a benchmark designed for evaluating compositional generalization in an interactive control environment. This benchmark introduces a new challenge for testing compositional generalization in a more realistic and comprehensive manner. By leveraging Craft Factory, we aim to promote the development of more advanced compositional generalization methods, thereby contributing to the broader field of general AI. We conducted experiments using our method alongside three popular compositional generalization approaches. The results (see Table 2) indicate that all four methods, including ours, have significant room for improvement.
Researcher Affiliation Industry Polixir Technologies, Nanjing, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methodologies and processes through textual descriptions and mathematical formulations (e.g., equations 1-8) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Aubing-H/craftfactory
Open Datasets Yes Craft Factory builds upon the Mine RL workbench crafting scenario(Guss et al. 2019), providing a vision-based, interactive, and open-ended environment for AI research.
Dataset Splits No The paper states: "For training, approximately 100 trajectories were selected for each task. For testing, we introduced one or two novel test cases." However, it does not provide exact percentages or specific sample counts for the overall training, validation, and testing sets, nor does it refer to predefined splits in a way that would allow for reproducible data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using "VPT (Video Pre-Train) backbone(Baker et al. 2022)" and a "Fi LM (Feature-wise Linear Modulation) Conditioned Layer(Perez et al. 2018; Cai et al. 2023a)", which are architectural components or methods. However, it does not specify any software libraries with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed to replicate the experiments.
Experiment Setup No The paper mentions that sequences are padded to a uniform length of 10 and embeddings are transformed into a 512-length embedding. It also states "For training, approximately 100 trajectories were selected for each task." However, crucial hyperparameters such as learning rate, batch size, optimizer, number of training epochs, or specific training schedules are not provided.