Self-Supervision is All You Need for Solving Rubik’s Cube

Authors: Kyo Takano

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When tested on Rubik s Cube, 15 Puzzle, and 7 7 Lights Out, our method outperformed the previous state-of-the-art method Deep Cube A, improving the trade-off between solution optimality and computational cost, despite significantly less training data. Furthermore, we investigate the scaling law of our Rubik s Cube solver with respect to model size and training data volume.
Researcher Affiliation Industry Kyo Takano EMAIL
Pseudocode Yes Algorithm 1 The proposed training algorithm. G: Goal state M: Set of moves B: Number of scrambles per batch N: Number of moves per scramble fθ: DNN parameterized by θ Ensure: fθ: Trained DNN while not converged do T for b = 1 to B do S G for n = 1 to N do Sample m random(M) Update S S m Add (S, m) to T Update fθ using T
Open Source Code Yes Our code is available at github.com/kyo-takano/efficientcube
Open Datasets Yes We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases.
Dataset Splits Yes We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases.
Hardware Specification Yes We used a single GPU (NVIDIA T4), whereas Agostinelli et al. (2019) employed four GPUs (NVIDIA TITAN V).
Software Dependencies No The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for any software frameworks (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or other libraries.
Experiment Setup Yes The DNN parameters are updated using Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 10 3. In this experiment, we set the maximum scramble length to 26, which is God s Number for Rubik s Cube in the quarter-turn metric (Kunkle & Cooperman, 2007). Additionally, the training process excludes clearly redundant or self-canceling permutations, such as R following R , when generating random scrambles. We train the DNN for 2, 000, 000 steps with a batch size of 26, 000 (26 moves per scramble 1, 000 scrambles per batch), which is equivalent to 2 billion solutions. ... We set a beam width of k and only pass the top k candidate paths to the next depth i + 1, sorted by their cumulative product of probabilities. ... To investigate the trade-off between the number of nodes expanded and the optimality of solutions, we systematically increase the beam width k from 20 to 218 in powers of 2.