LICO: Large Language Models for In-Context Molecular Optimization

Authors: Tung Nguyen, Aditya Grover

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LICO on molecular optimization, where the goal is to design new molecules with desired properties such as high chemical stability, low toxicity, or selective inhibition against a target disease. This problem plays a pivotal role in advancing drug and material discovery. ... We evaluate LICO on Practical Molecular Optimization (PMO) (Gao et al., 2022), a standard benchmark for molecular optimization with a focus on sample efficiency. We experiment on 23 optimization objectives provided by PMO... Table 1 summarizes the performance of the 7 considered methods across 23 optimization tasks in PMO-1K.
Researcher Affiliation Academia Tung Nguyen & Aditya Grover Department of Computer Science University of California, Los Angeles EMAIL
Pseudocode Yes Algorithm 1 outlines the optimization algorithm using LICO as the surrogate model. Algorithm 1 Black-box optimization with LICO
Open Source Code No The paper does not contain an explicit statement offering access to the source code for the described methodology, nor does it provide a link to a code repository. Phrases like 'We release our code...' or links to GitHub were not found.
Open Datasets Yes We use ZINC 250K as the unlabeled dataset Du. ZINC 250K contains around 250000 molecules sampled from the full ZINC database (Sterling & Irwin, 2015) with moderate size and high pharmaceutical relevance and popularity.
Dataset Splits Yes For each task, we vary the number of examples given to each method from 32 to 512, and evaluate their performance on 128 held-out data points. ... Each data point is a sequence of (x, y) pairs with length n U[64, 800].
Hardware Specification Yes All experiments in this paper are run on a cluster of 4 A6000 GPUs, each with 49GB of memory.
Software Dependencies No The paper mentions specific LLM models (e.g., Llama-2-7b, Qwen-1.5, Phi-2, T5-base, Nach0-base) and techniques (Lo RA, Liger Kernel) with their respective publication years or specific model identifiers. However, it does not provide specific version numbers for general software libraries, programming languages (e.g., Python, PyTorch, CUDA) or other ancillary tools used for implementation.
Experiment Setup Yes We train LICO for 20000 iterations with a batch size of 4, where each data point is a sequence of (x, y) pairs sampled from an intrinsic or synthetic function. The ratio of synthetic data is 0.1. ... We use a base learning rate of 5e 4 with a linear warmup for 1000 steps and a cosine decay for the remaining 19000 steps. We use Lo RA with a rank of 16 and α scale of 16. ... We initialize the observed dataset Dobs with a population of 34 molecules sampled randomly from ZINC. At each iteration, we use the best 34 candidates in Dobs to generate new candidates via crossover and mutation operations, with the mutation rate being 0.01. The candidate pool size C is 100. ... We set β = 10b, where b U[ 0.5, 1.5]. We then pick k = 15 candidates with the highest utility scores.