Does equivariance matter at scale?
Authors: Johann Brehmer, Sönke Behrends, Pim De Haan, Taco Cohen
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. |
| Researcher Affiliation | Industry | Johann Brehmer EMAIL Qualcomm AI Research Sönke Behrends Qualcomm AI Research Pim de Haan Qualcomm AI Research ... Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. and/or its subsidiaries. |
| Pseudocode | No | The paper describes methods and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper mentions that the data can be generated with 'the openly available repository at https://github.com/google-research/kubric', but this refers to a third-party tool used for data generation, not the authors' implementation code for the models described in the paper. |
| Open Datasets | Yes | We construct a dataset of rigid-body interactions following a proposal by Allen et al. (2023). ... We recreate the MOVi-B dataset used by Allen et al. (2023) as best as we can, using parameters from their paper and private communication; see Appendix A for details. ... Our data can be generated with the openly available repository at https://github.com/google-research/kubric. |
| Dataset Splits | Yes | Our training set consists of 4 105 such trajectories, while we use 1000 trajectories each for the validation and test set. |
| Hardware Specification | No | The paper mentions 'GPU utilization' and 'inter-GPU communication' and shows figures for 'single GPU' setups, but it does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using 'the Kubric simulator (Greff et al., 2022)' and 'the Py Bullet physics engine (Coumans & Bai, 2016 2024)' and 'the Adam optimizer (Kingma, 2015)'. However, it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We train all models with the Adam optimizer (Kingma, 2015), annealing the learning rate over the course of training from an initial value of 5 10 4 on a cosine schedule. ... a higher learning rate of 10 3 or 2 10 3... we use the same batch size of 64 samples... Early stopping is used in all experiments. ... Our architectures are shown in Tbl. 1. |