Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching

Authors: Aaron J Havens, Benjamin Kurt Miller, Bing Yan, Carles Domingo-Enrich, Anuroop Sriram, Daniel S. Levine, Brandon M Wood, Bin Hu, Brandon Amos, Brian Karrer, Xiang Fu, Guan-Horng Liu, Ricky T. Q. Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach through extensive experiments on classical energy functions, and further scale up to neural network-based energy models where we perform amortized conformer generation across many molecular systems.
Researcher Affiliation Collaboration 1University of Illinois Urbana-Champaign 2FAIR at Meta 3New York University 4Microsoft Research New England.
Pseudocode Yes Algorithm 1 Adjoint Sampling
Open Source Code Yes Code & and Benchmarks provided at github.com/facebookresearch/ adjoint sampling.
Open Datasets Yes We are releasing the data we used in this paper as a challenging Conformation Benchmark with the goal of fostering development of scalable, amortized sampling algorithms.
Dataset Splits Yes We split the SPICE molecules into a train set and a 80 molecule test set, allowing us to validate our sampler s ability to extrapolate to unseen molecules.
Hardware Specification No Each GPU maintains its own buffer... across 8 GPUs. We use 8 GPUs with 1024 8 initial molecules... The paper only mentions the number of GPUs used, but not specific models or types of GPUs, which is required for a 'Yes' answer.
Software Dependencies Yes We used the default hyperparameters from CREST version 3.0.2 including a 6 kcal/mol cutoff on final conformers.
Experiment Setup Yes For the DW-4 energy we use an EGNN with 3 layers and 128 hidden features. We train Adjoint Sampling for 1000 outer iterations, generating 512 new samples and energy evaluations per iteration into a buffer of max size 10000. In each iteration we optimize on 500 batches of batch size 512 from the replay buffer, using a learning rate of 3 10 4. We use a geometric noise schedule with σmin = 10 4 and σmax = 3.0.