reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Objective Molecular Design Through Learning Latent Pareto Set

Authors: Yiping Liu, Jiahao Yang, Xuanbai Ren, Zhang Xinyi, Yuansheng Liu, Bosheng Song, Xiangxiang Zeng, Hisao Ishibuchi

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that MLPS achieves state-of-the-art performance across various multi-objective scenarios, encompassing diverse objective types and varying numbers of objectives. The effectiveness of MLPS was further validated through real-world challenges in discovering antifungal peptides with low toxicity and high activity.
Researcher Affiliation	Academia	1College of Computer Science and Electronic Engineering, Hunan University, China 2Department of Computer Science and Engineering, Southern University of Science and Technology, China.
Pseudocode	Yes	Algorithm 1: The framework of MLPS; Algorithm 2: Global Pareto Set Learning Model Training
Open Source Code	Yes	Code https://github.com/Jiahao Young0520/MLPS/source Extended version https: //github.com/Jiahao Young0520/MLPS/extend-version
Open Datasets	No	The paper mentions molecular properties like Drug-likeness (QED) and synthetic accessibility (SA) predicted using RDKit, and inhibition of GSK3β and JNK3 predicted using a random forest model. For real-world tasks, it mentions scoring candidates with the Chemprop model (Jin, Barzilay, and Jaakkola 2020b) and Toxin Pred3.0 model (Rathore et al. 2023). It also mentions "5000 antimicrobial peptides as the initial population" but provides no access information for this specific dataset.
Dataset Splits	No	The paper does not explicitly mention training, validation, or test dataset splits. It states 'We generate 5000 molecules by each method' for evaluation, which refers to the generated solutions, not predefined dataset splits for model training.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions using Bo Torch (Balandat et al. 2020) and GPy Torch (Gardner et al. 2018) for implementing Gaussian processes, and RDKit for predicting molecular properties, but does not specify version numbers for any of these software components.
Experiment Setup	Yes	Our approach utilizes a latent space model with a dimensionality of 256, employing SELFIES as the molecular sequence representation. We employ the Sobol sampler (Renardy et al. 2021). The parameter β is set to 0.1. We divide the preference values [0,1] into 128-dimensional scales and apply two layers of cross-attention for preferences. The training process for the global model is described in Algorithm 2.