Self-Supervision is All You Need for Solving Rubik’s Cube
Authors: Kyo Takano
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When tested on Rubik s Cube, 15 Puzzle, and 7 7 Lights Out, our method outperformed the previous state-of-the-art method Deep Cube A, improving the trade-off between solution optimality and computational cost, despite significantly less training data. Furthermore, we investigate the scaling law of our Rubik s Cube solver with respect to model size and training data volume. |
| Researcher Affiliation | Industry | Kyo Takano EMAIL |
| Pseudocode | Yes | Algorithm 1 The proposed training algorithm. G: Goal state M: Set of moves B: Number of scrambles per batch N: Number of moves per scramble fθ: DNN parameterized by θ Ensure: fθ: Trained DNN while not converged do T for b = 1 to B do S G for n = 1 to N do Sample m random(M) Update S S m Add (S, m) to T Update fθ using T |
| Open Source Code | Yes | Our code is available at github.com/kyo-takano/efficientcube |
| Open Datasets | Yes | We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases. |
| Dataset Splits | Yes | We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases. |
| Hardware Specification | Yes | We used a single GPU (NVIDIA T4), whereas Agostinelli et al. (2019) employed four GPUs (NVIDIA TITAN V). |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for any software frameworks (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or other libraries. |
| Experiment Setup | Yes | The DNN parameters are updated using Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 10 3. In this experiment, we set the maximum scramble length to 26, which is God s Number for Rubik s Cube in the quarter-turn metric (Kunkle & Cooperman, 2007). Additionally, the training process excludes clearly redundant or self-canceling permutations, such as R following R , when generating random scrambles. We train the DNN for 2, 000, 000 steps with a batch size of 26, 000 (26 moves per scramble 1, 000 scrambles per batch), which is equivalent to 2 billion solutions. ... We set a beam width of k and only pass the top k candidate paths to the next depth i + 1, sorted by their cumulative product of probabilities. ... To investigate the trade-off between the number of nodes expanded and the optimality of solutions, we systematically increase the beam width k from 20 to 218 in powers of 2. |