reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Offline Model-Based Optimization by Learning to Rank

Authors: Rong-Xi Tan, Ke Xue, Shen-Huan Lyu, Haopu Shang, Yao Wang, Yaoyuan Wang, Fu Sheng, Chao Qian

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results across diverse tasks demonstrate the superior performance of our proposed ranking-based method than twenty existing methods. Our implementation is available at https://github.com/ lamda-bbo/Offline-Ra M. In this section, we empirically compare the proposed method with a large variety of previous offline MBO methods on various tasks.
Researcher Affiliation	Collaboration	1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 School of Artificial Intelligence, Nanjing University, China 3 Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, China 4 College of Computer Science and Software Engineering, Hohai University, China 5 Advanced Computing and Storage Lab, Huawei Technologies Co., Ltd., China
Pseudocode	Yes	Algorithm 1 Offline MBO by Learning to Rank
Open Source Code	Yes	Our implementation is available at https://github.com/ lamda-bbo/Offline-Ra M.
Open Datasets	Yes	We benchmark our method on Design-Bench tasks (Trabucco et al., 2022), including three continuous tasks and two discrete tasks.
Dataset Splits	Yes	We split the dataset into a training set and a validation set of the ratio 8 : 2. ... In Design-Bench, the training dataset is selected as the bottom performing x% in the entire collected dataset, (i.e., x = 40, 50, 60). ... We identify the excluded (100 x)% high-scoring data to comprise the OOD dataset for analysis...
Hardware Specification	No	No specific hardware details (like GPU/CPU models or cloud instance types) were provided in the paper. The paper mentions using PyTorch for implementation but does not specify the hardware used to run experiments.
Software Dependencies	No	The paper mentions using PyTorch (Paszke et al., 2019) and Adam (Kingma & Ba, 2015) but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We set the size n of training dataset to 10, 000, and following LETOR 4.0 (Qin & Liu, 2013; Qin et al., 2010b), a prevalent benchmark for LTR, we set the list length m = 1000. ... The model is optimized using Adam (Kingma & Ba, 2015) with a learning rate of 3 10 4 and a weight decay coefficient of 1 10 5. After the model is trained... we set η = 1 10 3 and T = 200 for continuous tasks, and η = 1 10 1 and T = 100 for discrete tasks to search for the final design. We use Re LU as activation functions.