Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization
Authors: Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon Shaolei Du
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we provide extensive experiments on the Best Choice Problem, Online Knapsack, and Ad Words to verify our findings. |
| Researcher Affiliation | Collaboration | Runlong Zhou EMAIL University of Washington Zelin He EMAIL University of Washington Yuandong Tian EMAIL Facebook AI Research Yi Wu EMAIL Tsinghua University Simon S. Du EMAIL University of Washington |
| Pseudocode | Yes | Algorithm 1 NPG (Full version: Algorithm 3) Algorithm 2 Curriculum learning framework. Algorithm 3 NPG: Sample-based NPG (full version). Algorithm 4 Sample: Sampler for s dĪsamp m,h where m Multinomial pw1, . . . , w Mq, a Unif A if unif |
| Open Source Code | Yes | Due to page limit, more formulation details and results are presented in Appendix C, and code can be found at https://github.com/zhourunlong/ RL-for-Combinatorial-Optimization. |
| Open Datasets | No | For BCP, each instance is a permutation of length n, and in each round an instance is drawn from an unknown distribution over all permutations. In Online Knapsack problems the decision-maker observes n (which is known) items arriving sequentially, each with value vi and size si revealed upon arrival. In ADW, there are n advertisers each with budget 1 and m ad slots. Each ad slot j arrives sequentially along with a vector pv1,j, v2,j, . . . , vn,jq where vi,j is the value that advertiser i wants to pay for ad slot j. |
| Dataset Splits | Yes | For the warm-up phases we set n 10 and for final phases n 100. |
| Hardware Specification | Yes | All the experiments were run on a server with CPU AMD Ryzen 9 3950X, GPU NVIDIA Ge Force 2080 Super and 128G memory. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies with version numbers. |
| Experiment Setup | Yes | They shared a learning rate of 0.2, batch size of 100 per step in horizon, final n 100 and warm-up n 10 (if applied curriculum learning). |