Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging
Authors: Behrooz Tahmasebi, Stefanie Jegelka
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings reveal two distinct regimes where canonicalization may outperform or underperform compared to group averaging, with precise quantification of this phase transition in terms of sample size, group action characteristics, and a newly introduced concept of alignment. To the best of our knowledge, this study represents the first theoretical exploration of such behavior, offering insights into the relative effectiveness of canonicalization and group averaging under varying conditions. 5 EXPERIMENTS We present proof-of-concept experiments in this section... Table 1: Final test loss averaged over ten different random seeds. |
| Researcher Affiliation | Academia | Behrooz Tahmasebi MIT EMAIL Stefanie Jegelka TUM and MIT |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about providing access to source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The experiments use synthetically generated data: 'd dimensional uniform data from the cube [-1, 1]3' and 'point clouds, each consisting of m points in ddimensional Euclidean space, modeled as elements of [-1, 1]mxd'. No specific publicly available datasets are referenced with access information. |
| Dataset Splits | Yes | The training data consists of n = 100 independent and identically distributed samples uniformly drawn from [-1, 1]3, each labeled with the optimal target function f and corrupted by Gaussian noise with a standard deviation of sigma = 1. We run experiments based on the above setting with n = 100 training and test samples. The test loss is calculated over 100 uniformly random point clouds sampled from [-1, 1]mxd as the test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions using a 'two-layer ReLU network' and 'SGD' for training, but does not provide specific version numbers for any software dependencies, libraries, or frameworks used. |
| Experiment Setup | Yes | Consider a linear model built on top of polynomial features of degree at most k = 3 over d = 3 dimensional uniform data... corrupted by Gaussian noise with a standard deviation of sigma = 1...We train a two-layer ReLU network with a width of 20 on this dataset using mean-squared loss. Training is performed with SGD, using a learning rate of 0.01 for 100 epochs. For this experiment, we set m = d = 5 and report the average test loss along with the standard deviation over ten runs. |