reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Constrained Offline Black-Box Optimization via Risk Evaluation and Management

Authors: Yiyi Zhu, Huakang Lu, Yupeng Wu, Shuo Liu, Jing-Wen Yang, Hong Qian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world tasks, e.g., space missions, process synthesis, and design problems, showcase COOREM s effectiveness in managing both OOD risk and constrained risk. Furthermore, our findings indicate that COOREM could outperform online methods that need to access the objective function in certain space missions. [...] Experiments This section first introduces the experimental setup, including baselines and tasks. Extensive experiments are conducted on these tasks to answer the following questions.
Researcher Affiliation	Collaboration	1School of Computer Science and Technology, East China Normal University, Shanghai 200062, China 2Intelligent NPC Team, Game AI Center, Tencent Inc, Shenzhen 518057, China EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Constrained Offline Optimization via Risk Evaluation and Management (COOREM)
Open Source Code	Yes	The code of this paper is available at https://github.com/zhuyiyi123/COOREM.
Open Datasets	Yes	We conduct experiments on two gtopx space mission tasks: Cassini 1 and Cassini1-MINLP (Schlueter et al. 2021) and three CEC tasks (Kumar et al. 2020): Three-bar truss design problem, Process synthesis problem, Welded beam design.
Dataset Splits	No	The paper mentions using an "offline dataset D = D+ D = {(x1, y1), . . . , (x N, y N)} {x 1, . . . , x M} is available" and refers to "feasible dataset D+" and "infeasible dataset D-". However, it does not specify how this dataset is split into training, validation, or testing sets for the experiments conducted, nor does it provide percentages or counts for any splits. It only states that the "same dataset" was provided to all COO methods.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions support from the National Natural Science Foundation of China.
Software Dependencies	No	The paper mentions using a "DNN model" and that "constrained evolutionary optimization, which is implemented by Scikit-opt." However, it does not provide specific version numbers for these or any other software components, libraries, or programming languages used in the experiments.
Experiment Setup	No	The paper identifies several hyperparameters such as "learning rate η, maximum Langevin dynamics step K, Langevin dynamics stepsize λ, and initial momentum m," as well as "σ and τ". It also states that "hyperparameter analysis on the impact of the step of Langevin Dynamics K, risk control σ and τ" is available in Appendix B. However, the main text does not explicitly provide the specific values for these hyperparameters used in the experiments.