Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning
Authors: Xinsong Feng, Zihan Yu, Yanhai Xiong, Haipeng Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources. 4 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Academia | Xinsong Feng1, Zihan Yu2, Yanhai Xiong3, Haipeng Chen3 1UCLA, 2The University of Hong Kong, 3William & Mary EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | The pseudocode of the training algorithm is shown in Algorithm 1. Algorithm 1 WS-option: Overall training Algorithm 2 WS-option: Run episode Algorithm 3 WS-option: Store transitions |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to repositories for its implementation. The conclusion mentions a limitation and future work regarding graph embedding techniques, implying code is not yet provided. |
| Open Datasets | No | The datasets used in the experiments are randomly generated using the erdos renyi graph() function from Python s Network X library, with a probability of 0.01. ... The datasets used in this problem (coordinates of the city) are also randomly generated using the random.rand() function from Python s Numpy library with a scale of 100. ... To demonstrate this, we evaluate our model on power2500 dataset, which has 2500 nodes. |
| Dataset Splits | No | The paper mentions training on 'a set of previously seen training graphs' and generalizing 'to unseen test graphs', and also training 'on graphs with 50-100 nodes' for generalization to larger ones. However, it does not specify any concrete percentages, sample counts, or methodologies for how these training/test splits are performed or defined. |
| Hardware Specification | Yes | All experiments on two COPs are implemented in Python and conducted on two machines. One is NVIDIA Ge Force RTX 4090 GPU and Intel 24GB 24-Core i9-13900K. The other is NVIDIA V100 GPU and Inter 256 GB 32-Core Xeon E5-2683 v4. |
| Software Dependencies | No | The paper mentions implementing in Python and using 'erdos renyi graph() function from Python s Network X library' and 'random.rand() function from Python s Numpy library'. However, it does not provide specific version numbers for Python, NetworkX, or Numpy. |
| Experiment Setup | Yes | To ensure the stability of the learning process in both problems, we employ Double Deep Q-Networks with an update frequency of 10 episodes. The ϵ-greedy strategy we use is initialized with ϵ = 0.9 and decays by a factor of 0.98 after each single training, with a minimum threshold of 0.1. We train each model for 20 epochs and 10 episodes. Table 6: Experiment hyper-parameters for both problems Parameter AIM RP Discount factor (γ) 0.99 0.99 Learning rate (higher-layer) 10 3 10 2 Learning rate (lower-layer) 10 4 10 4 Batch size 32 32 DDQN update frequency 10 episodes 10 episodes Initial ϵ in ϵ-greedy 0.9 0.9 Decay factor for ϵ 0.98 0.98 Minimum ϵ threshold 0.1 0.1 Epochs 20 20 Episodes 10 10 |