AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly

Authors: Hongyu Guo, Yoshua Bengio, Shengchao Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical validation on the benchmarking data CODCluster17 shows that Assemble Flow significantly outperforms six competitive deep learning baselines by at least 45% in assembly matching scores while maintaining 100% molecular integrity. Also, it matches the assembly performance of a widely used domain-specific simulation tool while reducing computational cost by 25-fold. We empirically evaluate Assemble Flow using the benchmarking crystallization dataset CODCluster17. The quantitative results reveal that Assemble Flow significantly outperforms six competitive deep learning baselines by at least 45% in terms of assembly matching score. Also, Assemble Flow exhibits strong assembly performance compared to a widely used domain-specific simulation tool for molecular assembly, achieving this with a 25-fold reduction in computational cost. Furthermore, we present qualitative results, including atomic collision properties of predicted crystals, which further demonstrate Assemble Flow s effectiveness in preserving and modeling the rigidity of the molecular crystallization and assembly process.
Researcher Affiliation Academia Hongyu Guo National Research Council Canada University of Ottawa EMAIL Yoshua Bengio Mila Québec AI Institute Université de Montréal CIFAR AI Chair EMAIL Shengchao Liu Université de Montréal EMAIL
Pseudocode Yes Note: the pseudo algorithm of our Assemble Flow is provided in Appendix E.3. A high-level overview and pseudo algorithm are provided in Algorithms 1 and 2 in Appendix E.3.
Open Source Code Yes The codes and checkpoints are available at this Git Hub repository.
Open Datasets Yes We evaluate our method using the crystallization dataset COD-Cluster17 (Liu et al., 2024c). This COD-Cluster17 is a curated subset derived from the Crystallography Open Database (COD) database (Grazulis et al., 2009).
Dataset Splits No We evaluate our method using the crystallization dataset COD-Cluster17 (Liu et al., 2024c). This COD-Cluster17 contains 133K crystals and is a curated subset derived from the Crystallography Open Database (COD) database (Grazulis et al., 2009). We consider three versions of COD-Cluster17, with 5k, 10k, and all data, respectively.
Hardware Specification No YB acknowledges support from NRC AI4D, CIFAR, and the CIFAR AI Chair program. This project s computational resources are provided by NRC and the Digital Research Alliance of Canada.
Software Dependencies No For each molecule in the cluster, we adopt the SE(3)-equivariant Pai NN (Schütt et al., 2021) to obtain the representation for each atom. ... The outputs include a molecular level predicted rotation velocity ˆqθ RM 3 and predicted translation velocity ˆxθ RM 3, where M is the number of molecules in the cluster. ... Optimization seed {0, 42, 123} epochs {1000, 2000} cutoff c {20, 50} learning rate {1e-4, 5e-4} optimizer {Adam }
Experiment Setup Yes We provide the key hyper-parameters of Assemble Flow in Table 6. Table 6: Hyperparameter specifications for Assemble Flow. Model Hyperparameter Value Intra-modeling Pai NN embedding dim {128} num of layers {3} cutoff {5} read out {mean} Intra-modeling Atomic Level num of layers {2,5} num of convolution {2} num of head {4, 8} num of timesteps {50, 200} α0 {1} α1 {1, 10} Intra-modeling Molecular Level num of layers {4,5} num of head {4, 8} num of timesteps {50, 200} α0 {1} α1 {1, 10} Optimization seed {0, 42, 123} epochs {1000, 2000} cutoff c {20, 50} learning rate {1e-4, 5e-4} optimizer {Adam }