Multi-Objective Molecular Design Through Learning Latent Pareto Set

Authors: Yiping Liu, Jiahao Yang, Xuanbai Ren, Zhang Xinyi, Yuansheng Liu, Bosheng Song, Xiangxiang Zeng, Hisao Ishibuchi

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that MLPS achieves state-of-the-art performance across various multi-objective scenarios, encompassing diverse objective types and varying numbers of objectives. The effectiveness of MLPS was further validated through real-world challenges in discovering antifungal peptides with low toxicity and high activity.
Researcher Affiliation Academia 1College of Computer Science and Electronic Engineering, Hunan University, China 2Department of Computer Science and Engineering, Southern University of Science and Technology, China.
Pseudocode Yes Algorithm 1: The framework of MLPS; Algorithm 2: Global Pareto Set Learning Model Training
Open Source Code Yes Code https://github.com/Jiahao Young0520/MLPS/source Extended version https: //github.com/Jiahao Young0520/MLPS/extend-version
Open Datasets No The paper mentions molecular properties like Drug-likeness (QED) and synthetic accessibility (SA) predicted using RDKit, and inhibition of GSK3β and JNK3 predicted using a random forest model. For real-world tasks, it mentions scoring candidates with the Chemprop model (Jin, Barzilay, and Jaakkola 2020b) and Toxin Pred3.0 model (Rathore et al. 2023). It also mentions "5000 antimicrobial peptides as the initial population" but provides no access information for this specific dataset.
Dataset Splits No The paper does not explicitly mention training, validation, or test dataset splits. It states 'We generate 5000 molecules by each method' for evaluation, which refers to the generated solutions, not predefined dataset splits for model training.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using Bo Torch (Balandat et al. 2020) and GPy Torch (Gardner et al. 2018) for implementing Gaussian processes, and RDKit for predicting molecular properties, but does not specify version numbers for any of these software components.
Experiment Setup Yes Our approach utilizes a latent space model with a dimensionality of 256, employing SELFIES as the molecular sequence representation. We employ the Sobol sampler (Renardy et al. 2021). The parameter β is set to 0.1. We divide the preference values [0,1] into 128-dimensional scales and apply two layers of cross-attention for preferences. The training process for the global model is described in Algorithm 2.