reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Authors: Xinsong Feng, Zihan Yu, Yanhai Xiong, Haipeng Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources. 4 EXPERIMENTAL RESULTS
Researcher Affiliation	Academia	Xinsong Feng1, Zihan Yu2, Yanhai Xiong3, Haipeng Chen3 1UCLA, 2The University of Hong Kong, 3William & Mary EMAIL, EMAIL, EMAIL
Pseudocode	Yes	The pseudocode of the training algorithm is shown in Algorithm 1. Algorithm 1 WS-option: Overall training Algorithm 2 WS-option: Run episode Algorithm 3 WS-option: Store transitions
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to repositories for its implementation. The conclusion mentions a limitation and future work regarding graph embedding techniques, implying code is not yet provided.
Open Datasets	No	The datasets used in the experiments are randomly generated using the erdos renyi graph() function from Python s Network X library, with a probability of 0.01. ... The datasets used in this problem (coordinates of the city) are also randomly generated using the random.rand() function from Python s Numpy library with a scale of 100. ... To demonstrate this, we evaluate our model on power2500 dataset, which has 2500 nodes.
Dataset Splits	No	The paper mentions training on 'a set of previously seen training graphs' and generalizing 'to unseen test graphs', and also training 'on graphs with 50-100 nodes' for generalization to larger ones. However, it does not specify any concrete percentages, sample counts, or methodologies for how these training/test splits are performed or defined.
Hardware Specification	Yes	All experiments on two COPs are implemented in Python and conducted on two machines. One is NVIDIA Ge Force RTX 4090 GPU and Intel 24GB 24-Core i9-13900K. The other is NVIDIA V100 GPU and Inter 256 GB 32-Core Xeon E5-2683 v4.
Software Dependencies	No	The paper mentions implementing in Python and using 'erdos renyi graph() function from Python s Network X library' and 'random.rand() function from Python s Numpy library'. However, it does not provide specific version numbers for Python, NetworkX, or Numpy.
Experiment Setup	Yes	To ensure the stability of the learning process in both problems, we employ Double Deep Q-Networks with an update frequency of 10 episodes. The ϵ-greedy strategy we use is initialized with ϵ = 0.9 and decays by a factor of 0.98 after each single training, with a minimum threshold of 0.1. We train each model for 20 epochs and 10 episodes. Table 6: Experiment hyper-parameters for both problems Parameter AIM RP Discount factor (γ) 0.99 0.99 Learning rate (higher-layer) 10 3 10 2 Learning rate (lower-layer) 10 4 10 4 Batch size 32 32 DDQN update frequency 10 episodes 10 episodes Initial ϵ in ϵ-greedy 0.9 0.9 Decay factor for ϵ 0.98 0.98 Minimum ϵ threshold 0.1 0.1 Epochs 20 20 Episodes 10 10