reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design

Authors: Onur Boyar, Hiroyuki Hanada, Ichiro Takeuchi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive evaluations across diverse optimization tasks including rediscovery, docking score, and multi-property optimization show that CLa SMO efficiently enhances target properties, delivers remarkable sample-efficiency crucial for resource-limited applications while considering molecular similarity constraints, achieves state of the art performance, and maintains practical synthetic accessibility. We evaluate our approach on a diverse suite of 20 molecular optimization tasks that span a wide range of objectives... In Section 5.3, we provide an ablation study to examine the impact of conditional generation.
Researcher Affiliation	Academia	Onur Boyar EMAIL Nagoya University Hiroyuki Hanada EMAIL Nagoya University Ichiro Takeuchi EMAIL Nagoya University RIKEN
Pseudocode	Yes	Algorithm 1 CLa SMO
Open Source Code	Yes	1The source code of this work is available at https://github.com/onurboyar/CLASMO-TMLR. We also provide an open-source web application1 that enables chemical experts to apply CLa SMO in a Human-in-the-Loop setting. We open source a web-application, https://clasmo.streamlit.app/, that enables interactive optimization of input molecules via CLa SMO, which allows chemical experts to decide the region to modify in input molecule, enabling Human-in-the-Loop optimization settings.
Open Datasets	Yes	To this end, we evaluated two widely used open-source molecular datasets: QM9 (Ruddigkeit et al., 2012; Ramakrishnan et al., 2014) and ZINC250K (Gómez-Bombarelli et al., 2018).
Dataset Splits	Yes	Among the 18,706 instances, 80% were used for model training, with the remaining data allocated for testing and validation.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments. It mentions 'increased computational power' in the introduction but no concrete specifications.
Software Dependencies	No	The paper mentions software like 'Py Torch' and 'RDKit' implicitly (through phrases like 'Py Torch s Reduce LROn Plateau' and 'using a machine learning classifier trained on retrosynthesis outcomes' which usually relies on cheminformatics libraries like RDKit), but does not specify version numbers for any of the software components used in the experiments.
Experiment Setup	Yes	Models were evaluated based on their reconstruction performance and generative diversity, leading us to select the model with β = 0.000001 as the optimal candidate. The initial learning rate was set to 0.001. For CLa SMO s penalization terms, we opted for a straightforward approach rather than an exhaustive hyperparameter optimization process. We set λ1 = 5 to penalize cases where the similarity constraint DICE(S, S ) > τ was violated. Similarly, we assigned λ2 = 7.5 for situations where the generated substructure could not be added to the input scaffold. We determined that a 2-dimensional latent space was sufficient to achieve over 90% reconstruction accuracy on the test set... we generated a 2-dimensional embedding for the condition vectors. The best results were obtained when the mutation rate is equal to 4.5% and the population size is equal to 5 [for Graph GA]. For SMILES-GA... population size was set to 5. Experiments were conducted with a fixed budget of 100 oracle evaluations per seed across 10 seeds.