Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Authors: Xinsong Feng, Zihan Yu, Yanhai Xiong, Haipeng Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources. 4 EXPERIMENTAL RESULTS
Researcher Affiliation Academia Xinsong Feng1, Zihan Yu2, Yanhai Xiong3, Haipeng Chen3 1UCLA, 2The University of Hong Kong, 3William & Mary EMAIL, EMAIL, EMAIL
Pseudocode Yes The pseudocode of the training algorithm is shown in Algorithm 1. Algorithm 1 WS-option: Overall training Algorithm 2 WS-option: Run episode Algorithm 3 WS-option: Store transitions
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to repositories for its implementation. The conclusion mentions a limitation and future work regarding graph embedding techniques, implying code is not yet provided.
Open Datasets No The datasets used in the experiments are randomly generated using the erdos renyi graph() function from Python s Network X library, with a probability of 0.01. ... The datasets used in this problem (coordinates of the city) are also randomly generated using the random.rand() function from Python s Numpy library with a scale of 100. ... To demonstrate this, we evaluate our model on power2500 dataset, which has 2500 nodes.
Dataset Splits No The paper mentions training on 'a set of previously seen training graphs' and generalizing 'to unseen test graphs', and also training 'on graphs with 50-100 nodes' for generalization to larger ones. However, it does not specify any concrete percentages, sample counts, or methodologies for how these training/test splits are performed or defined.
Hardware Specification Yes All experiments on two COPs are implemented in Python and conducted on two machines. One is NVIDIA Ge Force RTX 4090 GPU and Intel 24GB 24-Core i9-13900K. The other is NVIDIA V100 GPU and Inter 256 GB 32-Core Xeon E5-2683 v4.
Software Dependencies No The paper mentions implementing in Python and using 'erdos renyi graph() function from Python s Network X library' and 'random.rand() function from Python s Numpy library'. However, it does not provide specific version numbers for Python, NetworkX, or Numpy.
Experiment Setup Yes To ensure the stability of the learning process in both problems, we employ Double Deep Q-Networks with an update frequency of 10 episodes. The ϵ-greedy strategy we use is initialized with ϵ = 0.9 and decays by a factor of 0.98 after each single training, with a minimum threshold of 0.1. We train each model for 20 epochs and 10 episodes. Table 6: Experiment hyper-parameters for both problems Parameter AIM RP Discount factor (γ) 0.99 0.99 Learning rate (higher-layer) 10 3 10 2 Learning rate (lower-layer) 10 4 10 4 Batch size 32 32 DDQN update frequency 10 episodes 10 episodes Initial ϵ in ϵ-greedy 0.9 0.9 Decay factor for ϵ 0.98 0.98 Minimum ϵ threshold 0.1 0.1 Epochs 20 20 Episodes 10 10