reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Search from Demonstration Sequences

Authors: Dixant Mittal, Liwei Kang, Wee Sun Lee

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study problems from these two scenarios, including Game of 24, 2D grid navigation, and Procgen games, to understand when D-TSN is more helpful. Through our experiments, we show that D-TSN is effective, especially when the world model with a latent state space is jointly learned. The code is available at https://github. com/dixantmittal/differentiable-tree-search-network. Section 4, titled 'EXPERIMENTS', further details empirical evaluations with results presented in tables such as Table 1, Table 2, and Table 3.
Researcher Affiliation	Collaboration	Dixant Mittal1,2 Liwei Kang1 Wee Sun Lee1 1National University of Singapore 2 Moovita EMAIL. One author, Dixant Mittal, is affiliated with both National University of Singapore (an academic institution) and Moovita (an industry affiliation).
Pseudocode	Yes	A DIFFERENTIABLE TREE SEARCH NETWORK ALGORITHM. A.1 DIFFERENTIABLE TREE SEARCH NETWORK PSEUDO-CODE. Algorithm 1: Differentiable Tree Search (D-TSN)
Open Source Code	Yes	The code is available at https://github. com/dixantmittal/differentiable-tree-search-network.
Open Datasets	No	For Game of 24, the authors state: 'We collected all valid Game of 24 problems and their solutions through an exhaustive search of all combinations, then randomly selected 530 problems for evaluation. The remaining 527 problems have 16k valid solutions, from which we randomly sampled a subset for training.' For Navigation and Procgen, they state: 'We use a behavior policy, which can be optimal or sub-optimal, to collect demonstration sequences for training.' The paper describes collecting its own datasets for experiments and does not provide public access links or specific citations for these collected datasets.
Dataset Splits	Yes	We collected all valid Game of 24 problems and their solutions through an exhaustive search of all combinations, then randomly selected 530 problems for evaluation. The remaining 527 problems have 16k valid solutions, from which we randomly sampled a subset for training.
Hardware Specification	No	The paper states: 'Notably, greater depths, such as 3 or more, are infeasible since the resulting computation graph exceeds the memory capacity (roughly 11GB) of a typical consumer-grade GPU.' This describes a general limitation or characteristic of a type of hardware, rather than explicitly specifying the hardware used for the experiments conducted in the paper.
Software Dependencies	No	The paper mentions using 'Llama3-8B (Dubey et al., 2024)' and refers to 'Phasic Policy Gradient (PPG) (Cobbe et al., 2021)' as a baseline, but it does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow) used in the implementation of D-TSN or its experiments.
Experiment Setup	Yes	We train D-TSN using 8 search iterations in training and compare the resulting value function with a supervised fine-tuned model... For our empirical evaluations, we set the maximum limit for search iterations at 10. For our evaluations, we perform 10 search iterations for each input state. To train this model, we compute the Q-value, Qθ, without performing the search and optimize the loss defined as: LSearch = λ1LQ + λ2LD + λ3LTθ + λ4LRθ. For evaluations, we adhere to a depth of 2 for Tree QN... We limit the number of trajectories to 1000 for each domain to evaluate the sample complexity and generalization capabilities of each method. We fine-tune the hyperparameters, λ1, λ2, λ3 and λ4, using grid search on a log scale.