Does equivariance matter at scale?

Authors: Johann Brehmer, Sönke Behrends, Pim De Haan, Taco Cohen

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions.
Researcher Affiliation Industry Johann Brehmer EMAIL Qualcomm AI Research Sönke Behrends Qualcomm AI Research Pim de Haan Qualcomm AI Research ... Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. and/or its subsidiaries.
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper mentions that the data can be generated with 'the openly available repository at https://github.com/google-research/kubric', but this refers to a third-party tool used for data generation, not the authors' implementation code for the models described in the paper.
Open Datasets Yes We construct a dataset of rigid-body interactions following a proposal by Allen et al. (2023). ... We recreate the MOVi-B dataset used by Allen et al. (2023) as best as we can, using parameters from their paper and private communication; see Appendix A for details. ... Our data can be generated with the openly available repository at https://github.com/google-research/kubric.
Dataset Splits Yes Our training set consists of 4 105 such trajectories, while we use 1000 trajectories each for the validation and test set.
Hardware Specification No The paper mentions 'GPU utilization' and 'inter-GPU communication' and shows figures for 'single GPU' setups, but it does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions using 'the Kubric simulator (Greff et al., 2022)' and 'the Py Bullet physics engine (Coumans & Bai, 2016 2024)' and 'the Adam optimizer (Kingma, 2015)'. However, it does not provide specific version numbers for these software components.
Experiment Setup Yes We train all models with the Adam optimizer (Kingma, 2015), annealing the learning rate over the course of training from an initial value of 5 10 4 on a cosine schedule. ... a higher learning rate of 10 3 or 2 10 3... we use the same batch size of 64 samples... Early stopping is used in all experiments. ... Our architectures are shown in Tbl. 1.