reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging

Authors: Behrooz Tahmasebi, Stefanie Jegelka

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our findings reveal two distinct regimes where canonicalization may outperform or underperform compared to group averaging, with precise quantification of this phase transition in terms of sample size, group action characteristics, and a newly introduced concept of alignment. To the best of our knowledge, this study represents the first theoretical exploration of such behavior, offering insights into the relative effectiveness of canonicalization and group averaging under varying conditions. 5 EXPERIMENTS We present proof-of-concept experiments in this section... Table 1: Final test loss averaged over ten different random seeds.
Researcher Affiliation	Academia	Behrooz Tahmasebi MIT EMAIL Stefanie Jegelka TUM and MIT
Pseudocode	No	The paper describes methods using mathematical formulations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about providing access to source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	No	The experiments use synthetically generated data: 'd dimensional uniform data from the cube [-1, 1]3' and 'point clouds, each consisting of m points in ddimensional Euclidean space, modeled as elements of [-1, 1]mxd'. No specific publicly available datasets are referenced with access information.
Dataset Splits	Yes	The training data consists of n = 100 independent and identically distributed samples uniformly drawn from [-1, 1]3, each labeled with the optimal target function f and corrupted by Gaussian noise with a standard deviation of sigma = 1. We run experiments based on the above setting with n = 100 training and test samples. The test loss is calculated over 100 uniformly random point clouds sampled from [-1, 1]mxd as the test set.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies	No	The paper mentions using a 'two-layer ReLU network' and 'SGD' for training, but does not provide specific version numbers for any software dependencies, libraries, or frameworks used.
Experiment Setup	Yes	Consider a linear model built on top of polynomial features of degree at most k = 3 over d = 3 dimensional uniform data... corrupted by Gaussian noise with a standard deviation of sigma = 1...We train a two-layer ReLU network with a width of 20 on this dataset using mean-squared loss. Training is performed with SGD, using a learning rate of 0.01 for 100 epochs. For this experiment, we set m = d = 5 and report the average test loss along with the standard deviation over ten runs.