MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning

Authors: Peter Eckmann, Dongxia Wu, Germano Heinzelmann, Michael K Gilson, Rose Yu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on two disease-relevant proteins show that MF-LAL produces compounds with significantly better binding free energy scores than other single and multi-fidelity approaches ( 50% improvement in mean binding free energy score). The paper also includes a dedicated section titled "4. Experiments" with subsections for setup, baselines, and results.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, UC San Diego, La Jolla, California, United States 2Departamento de F ısica, Universidade Federal de Santa Catarina, Brazil 3Department of Chemistry and Biochemistry, UC San Diego, La Jolla, California, United States 4Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, California, United States. Correspondence to: Peter Eckmann <EMAIL>, Michael K. Gilson <EMAIL>, Rose Yu <EMAIL>.
Pseudocode Yes The paper includes clearly labeled algorithm blocks: "Algorithm 1 Active learning for MF-LAL" and "Algorithm 2 MF-LAL molecule generation procedure".
Open Source Code Yes The code is available at https:// github.com/Rose-STL-Lab/MF-LAL.
Open Datasets Yes We used BRD4(2) and, separately, c-MET data from Binding DB (Liu et al., 2007) to train a simple linear regression model.
Dataset Splits No Each model was provided with an initial dataset of random ZINC250k (Irwin et al., 2012) compounds evaluated at each fidelity. Each fidelity had 5 random compounds selected, except for the first fidelity, where we supplied 200,000 compounds and associated oracle outputs. This describes initial dataset provisioning for active learning, but not explicit train/test/validation splits for model evaluation.
Hardware Specification Yes All experiments were conducted on a server with 8 RTX 2080 Ti GPUs.
Software Dependencies No We used Auto Dock-GPU (Santos-Martins et al., 2021), a GPU-accelerated version of Auto Dock4, for all computation. All molecular dynamics simulators are run with AMBER with GPU support. While specific software and tools are mentioned, precise version numbers for these components are not provided.
Experiment Setup Yes The encoder, decoder, and h networks are all 3-layer feed-forward networks with ReLU activations and a 512-dimensional hidden layer. Each latent space has 64 dimensions. At each active learning step, we train the whole model from scratch until convergence with the Adam optimizer using a learning rate of 0.0001. For the molecule generation procedure using gradient-based optimization, we use the Adam optimizer with a learning rate of 0.1 for 100 epochs. We set β = 1 during active learning, and β = 0 after during inference to only focus on the most promising compounds.