reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LICO: Large Language Models for In-Context Molecular Optimization

Authors: Tung Nguyen, Aditya Grover

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate LICO on molecular optimization, where the goal is to design new molecules with desired properties such as high chemical stability, low toxicity, or selective inhibition against a target disease. This problem plays a pivotal role in advancing drug and material discovery. ... We evaluate LICO on Practical Molecular Optimization (PMO) (Gao et al., 2022), a standard benchmark for molecular optimization with a focus on sample efficiency. We experiment on 23 optimization objectives provided by PMO... Table 1 summarizes the performance of the 7 considered methods across 23 optimization tasks in PMO-1K.
Researcher Affiliation	Academia	Tung Nguyen & Aditya Grover Department of Computer Science University of California, Los Angeles EMAIL
Pseudocode	Yes	Algorithm 1 outlines the optimization algorithm using LICO as the surrogate model. Algorithm 1 Black-box optimization with LICO
Open Source Code	No	The paper does not contain an explicit statement offering access to the source code for the described methodology, nor does it provide a link to a code repository. Phrases like 'We release our code...' or links to GitHub were not found.
Open Datasets	Yes	We use ZINC 250K as the unlabeled dataset Du. ZINC 250K contains around 250000 molecules sampled from the full ZINC database (Sterling & Irwin, 2015) with moderate size and high pharmaceutical relevance and popularity.
Dataset Splits	Yes	For each task, we vary the number of examples given to each method from 32 to 512, and evaluate their performance on 128 held-out data points. ... Each data point is a sequence of (x, y) pairs with length n U[64, 800].
Hardware Specification	Yes	All experiments in this paper are run on a cluster of 4 A6000 GPUs, each with 49GB of memory.
Software Dependencies	No	The paper mentions specific LLM models (e.g., Llama-2-7b, Qwen-1.5, Phi-2, T5-base, Nach0-base) and techniques (Lo RA, Liger Kernel) with their respective publication years or specific model identifiers. However, it does not provide specific version numbers for general software libraries, programming languages (e.g., Python, PyTorch, CUDA) or other ancillary tools used for implementation.
Experiment Setup	Yes	We train LICO for 20000 iterations with a batch size of 4, where each data point is a sequence of (x, y) pairs sampled from an intrinsic or synthetic function. The ratio of synthetic data is 0.1. ... We use a base learning rate of 5e 4 with a linear warmup for 1000 steps and a cosine decay for the remaining 19000 steps. We use Lo RA with a rank of 16 and α scale of 16. ... We initialize the observed dataset Dobs with a population of 34 molecules sampled randomly from ZINC. At each iteration, we use the best 34 candidates in Dobs to generate new candidates via crossover and mutation operations, with the mutation rate being 0.01. The candidate pool size C is 100. ... We set β = 10b, where b U[ 0.5, 1.5]. We then pick k = 15 candidates with the highest utility scores.