reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

Authors: Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon Shaolei Du

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we provide extensive experiments on the Best Choice Problem, Online Knapsack, and Ad Words to verify our findings.
Researcher Affiliation	Collaboration	Runlong Zhou EMAIL University of Washington Zelin He EMAIL University of Washington Yuandong Tian EMAIL Facebook AI Research Yi Wu EMAIL Tsinghua University Simon S. Du EMAIL University of Washington
Pseudocode	Yes	Algorithm 1 NPG (Full version: Algorithm 3) Algorithm 2 Curriculum learning framework. Algorithm 3 NPG: Sample-based NPG (full version). Algorithm 4 Sample: Sampler for s dπsamp m,h where m Multinomial pw1, . . . , w Mq, a Unif A if unif
Open Source Code	Yes	Due to page limit, more formulation details and results are presented in Appendix C, and code can be found at https://github.com/zhourunlong/ RL-for-Combinatorial-Optimization.
Open Datasets	No	For BCP, each instance is a permutation of length n, and in each round an instance is drawn from an unknown distribution over all permutations. In Online Knapsack problems the decision-maker observes n (which is known) items arriving sequentially, each with value vi and size si revealed upon arrival. In ADW, there are n advertisers each with budget 1 and m ad slots. Each ad slot j arrives sequentially along with a vector pv1,j, v2,j, . . . , vn,jq where vi,j is the value that advertiser i wants to pay for ad slot j.
Dataset Splits	Yes	For the warm-up phases we set n 10 and for final phases n 100.
Hardware Specification	Yes	All the experiments were run on a server with CPU AMD Ryzen 9 3950X, GPU NVIDIA Ge Force 2080 Super and 128G memory.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies with version numbers.
Experiment Setup	Yes	They shared a learning rate of 0.2, batch size of 100 per step in horizon, final n 100 and warm-up n 10 (if applied curriculum learning).