Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Solving the Rubik's Cube with Approximate Policy Iteration

Authors: Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi

ICLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves less than or equal to solvers that employ human domain knowledge. Our algorithm, called Autodidactic Iteration (ADI), trains a neural network value and policy function through an iterative process. These neural networks are the "fast policy" of DPI described earlier. After the network is trained, it is combined with MCTS to effectively solve the Rubik s Cube. We call the resulting solver Deep Cube.
Researcher Affiliation Academia Stephen Mc Aleer Department of Statistics University of California, Irvine EMAIL Forest Agostinelli Department of Computer Science University of California, Irvine EMAIL Alexander Shmakov Department of Computer Science University of California, Irvine EMAIL Pierre Baldi Department of Computer Science University of California, Irvine EMAIL
Pseudocode Yes Algorithm 1: Autodidactic Iteration
Open Source Code No The paper does not include an unambiguous statement or a direct link to a source-code repository for the methodology described in this paper.
Open Datasets No The paper generates its own training data by starting from the solved state and scrambling the cube, rather than using a pre-existing, publicly accessible dataset with concrete access information.
Dataset Splits No The paper mentions 'training samples' and evaluating on 'randomly scrambled cubes' but does not specify exact dataset splits (percentages or counts) for training, validation, or testing.
Hardware Specification Yes Our training machine was a 32-core Intel Xeon E5-2620 server with three NVIDIA Titan XP GPUs.
Software Dependencies No The paper mentions the use of the RMSProp optimizer and a feed forward network, but it does not specify versions for any key software libraries, frameworks, or dependencies (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup No The paper mentions general training details such as using the RMSProp optimizer, mean squared error loss, softmax cross entropy loss, and the number of iterations (2,000,000), along with mentioning exploration (c) and virtual loss (ν) hyperparameters, but it does not provide specific numerical values for these hyperparameters or other system-level training settings.