reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Flopping for FLOPs: Leveraging Equivariance for Computational Efficiency

Authors: Georg Bökman, David Nordström, Fredrik Kahl

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the effectiveness of flopping-equivariant networks. We first briefly discuss the setting of the experiments. Then we compare the efficiency and accuracy of equivariant versions of Res MLPs, Vi Ts and Conv Ne Xts from Section 4 to their non-equivariant counterparts. For a given architecture X, the equivariant version is E(X). E(X) always has around half the number of trainable parameters and FLOPs of X due to the block-diagonalization of the weight matrices (2). We will release code and weights at github.com/georg-bn/flopping-for-flops.
Researcher Affiliation	Academia	1Chalmers University of Technology. Correspondence to: Georg B okman <EMAIL>.
Pseudocode	No	The paper describes the mathematical foundations and architectural changes in detail, but it does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	We will release code and weights at github.com/georg-bn/flopping-for-flops.
Open Datasets	Yes	Dataset. We benchmark our model implementations on the Image Net-1K dataset (Deng et al., 2009; Russakovsky et al., 2015; Recht et al., 2019), which includes 1.2M images evenly spread over 1,000 object categories.
Dataset Splits	Yes	Dataset. We benchmark our model implementations on the Image Net-1K dataset (Deng et al., 2009; Russakovsky et al., 2015; Recht et al., 2019), which includes 1.2M images evenly spread over 1,000 object categories.
Hardware Specification	Yes	Hardware. All experiments were run on NVIDIA A100-40GB. The per GPU batch size ranged between 64 (for larger models) to 256 (for smaller models). The biggest model requires training on 32 A100 GPUs for c. 2 days.
Software Dependencies	Yes	Software versioning. Our experiments build upon Py Torch (Paszke et al., 2019) and the timm (Wightman, 2019) library. We enable mixed-precision training using the deprecated NVIDIA library Apex, this is to mirror the training recipes of the benchmarks as closely as possible. To enable Py Torch s compiler, we use a modern version ( 2.0). Specifically, we use Py Torch 2.5.1 with CUDA 11.8.
Experiment Setup	Yes	Hyperparameters. We use the same training recipes as the baselines. The complete set of hyperparameters can be found in Table 2 in the appendix. ... Table 2: Training recipes for different model architectures. We try to, as closely as possible, replicate the training recipe of the baselines.