Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Solving the Rubik's Cube with Approximate Policy Iteration
Authors: Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves less than or equal to solvers that employ human domain knowledge. Our algorithm, called Autodidactic Iteration (ADI), trains a neural network value and policy function through an iterative process. These neural networks are the "fast policy" of DPI described earlier. After the network is trained, it is combined with MCTS to effectively solve the Rubik s Cube. We call the resulting solver Deep Cube. |
| Researcher Affiliation | Academia | Stephen Mc Aleer Department of Statistics University of California, Irvine EMAIL Forest Agostinelli Department of Computer Science University of California, Irvine EMAIL Alexander Shmakov Department of Computer Science University of California, Irvine EMAIL Pierre Baldi Department of Computer Science University of California, Irvine EMAIL |
| Pseudocode | Yes | Algorithm 1: Autodidactic Iteration |
| Open Source Code | No | The paper does not include an unambiguous statement or a direct link to a source-code repository for the methodology described in this paper. |
| Open Datasets | No | The paper generates its own training data by starting from the solved state and scrambling the cube, rather than using a pre-existing, publicly accessible dataset with concrete access information. |
| Dataset Splits | No | The paper mentions 'training samples' and evaluating on 'randomly scrambled cubes' but does not specify exact dataset splits (percentages or counts) for training, validation, or testing. |
| Hardware Specification | Yes | Our training machine was a 32-core Intel Xeon E5-2620 server with three NVIDIA Titan XP GPUs. |
| Software Dependencies | No | The paper mentions the use of the RMSProp optimizer and a feed forward network, but it does not specify versions for any key software libraries, frameworks, or dependencies (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | No | The paper mentions general training details such as using the RMSProp optimizer, mean squared error loss, softmax cross entropy loss, and the number of iterations (2,000,000), along with mentioning exploration (c) and virtual loss (ν) hyperparameters, but it does not provide specific numerical values for these hyperparameters or other system-level training settings. |