reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training Dynamics of Learning 3D-Rotational Equivariance

Authors: Max W Shen, Ewa Nowara, Michael Maser, Kyunghyun Cho

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We focus our empirical investigation to 3D-rotation equivariance on high-dimensional molecular tasks (flow matching, force field prediction, denoising voxels) and find that models reduce equivariance error quickly to 2% held-out loss within 1k-10k training steps, a result robust to model and dataset size. This happens because learning 3D-rotational equivariance is an easier learning task, with a smoother and better-conditioned loss landscape, than the main prediction task. For 3D rotations, the loss penalty for non-equivariant models is small throughout training, so they may achieve lower test loss than equivariant models per GPU-hour unless the equivariant efficiency gap is narrowed. We also experimentally and theoretically investigate the relationships between relative equivariance error, learning gradients, and model parameters. To gain insight into the empirical learning behavior of non-equivariant models, we apply our loss decomposition framework to three high-dimensional learning problems on 3D molecules, each with a distinct task and a modern non-equivariant model architecture. For each task, we follow the standard training procedure described in its original publication. We trained ESc AIP 6M on a subset of SPICE with 950k training examples used by Qu & Krishnapriyan (2024) for 30 epochs with batch size 64.
Researcher Affiliation	Collaboration	Max W. Shen EMAIL Genentech Computational Sciences, Ewa M. Nowara Genentech Computational Sciences, Michael Maser Genentech Computational Sciences, Kyunghyun Cho Genentech Computational Sciences & New York University
Pseudocode	No	The paper includes mathematical derivations, proofs, and figures, but no explicitly labeled pseudocode or algorithm blocks are present.
Open Source Code	Yes	Code Availability We provide code at https://github.com/genentech/equivariance_learning. Our code simply adds callbacks to compute equivariance metrics during training on top of the original ESc AIP, Proteína, and Vox Mol codebases.
Open Datasets	Yes	We trained ESc AIP 6M on a subset of SPICE with 950k training examples used by Qu & Krishnapriyan (2024)... SPICE is a dataset with of small molecule 3D conformers with energies and forces computed by quantum-mechanical density functional theory (Eastman et al., 2024). We trained Proteína at 60M without triangular attention and 400M with triangular attention on the full Protein databank (PDB) dataset with 225k training examples. We trained Vox Mol 111M on GEOM-drugs, a dataset of 3D structures of drug-like molecules with 1.1M training examples.
Dataset Splits	Yes	We trained ESc AIP 6M on a subset of SPICE with 950k training examples... We varied ... training set size from 950k, 50k, 5k, and 500 (with batch size 1)... We trained Proteína at 60M ... on the full Protein databank (PDB) dataset with 225k training examples. We also trained models on 1% of the PDB with 2k examples and 0.1% with 200 examples. We trained Vox Mol 111M on GEOM-drugs... with 1.1M training examples. We also trained models on 1% (11k), 10% (110k), 25% (275k), and 50% (550k) examples... We report both of these metrics, as well as the percentage of the total loss attributable to the model s lack of equivariance, on a held-out dataset during training.
Hardware Specification	No	The paper mentions 'GPU-hours' in the context of efficiency, but does not provide specific details on the GPU models, CPUs, or other hardware used for running their experiments.
Software Dependencies	No	The paper mentions using existing codebases like ESc AIP, Proteína, and Vox Mol, but does not specify any software dependencies with version numbers (e.g., Python, PyTorch versions).
Experiment Setup	Yes	We trained ESc AIP 6M on a subset of SPICE with 950k training examples used by Qu & Krishnapriyan (2024) for 30 epochs with batch size 64. SPICE is a dataset with of small molecule 3D conformers with energies and forces computed by quantum-mechanical density functional theory (Eastman et al., 2024). We varied model size from 1M, 4M and 6M, varied training set size from 950k, 50k, 5k, and 500 (with batch size 1), and varied the optimizer or learning rate. We trained Proteína at 60M without triangular attention and 400M with triangular attention on the full Protein databank (PDB) dataset with 225k training examples. We trained Vox Mol 111M on GEOM-drugs, a dataset of 3D structures of drug-like molecules with 1.1M training examples. We also trained models on 1% (11k), 10% (110k), 25% (275k), and 50% (550k) examples, and models of varying size: full (111 M parameters), small (28 M), and tiny (7 M). In this autoencoding task, an equivariant model would output a rotated predicted reconstruction when the input molecule rotates; this is a commonly desired property when using the decoder as a generative model Pinheiro et al. (2023). We follow the same training recipe as the original repository, which does not use data augmentation. During training, data augmentation is performed by applying random rotations and translations to each sample.