Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
Authors: Marvin F. Da Silva, Felix Dangel, Sageev Oore
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present results on diagonal nets with synthetic data and show that our geodesic sharpness reveals strong correlation with generalization for real-world transformers on both text and image classification tasks. Sections 5.1.1, 5.3.1, and 5.3.2 are explicitly labeled "EMPIRICAL VALIDATION" for different models and datasets. |
| Researcher Affiliation | Collaboration | The authors are affiliated with "Dalhousie University, Halifax, Canada" (an academic institution) and "Vector Insitute for Artificial Intelligence, Toronto, Canada" (a research institute often associated with industry), indicating a collaboration. |
| Pseudocode | Yes | In Appendix C.2, the paper presents "Algorithm 1 Auto-PGD" which is a structured block of pseudocode for their method. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The paper uses well-known public datasets, citing them appropriately: "fine-tuning CLIP on Image Net-1k (Radford et al., 2021)" and "BERT models that were fine-tuned on MNLI (Williams et al., 2018)." |
| Dataset Splits | Yes | The paper mentions using specific parts of standard datasets, implying their well-defined splits: "Image Net training set, divided into batches of 256" and referring to the "MNLI dev matched set (Williams et al., 2018)." |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general funding sources like NSERC, CIFAR, and the Vector Institute. |
| Software Dependencies | No | The paper mentions using Auto-PGD (Croce & Hein, 2020) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | No | The paper mentions general settings such as "batches of 256" for ImageNet and "batches of 128 points" for MNLI, and that models were "fine-tuned," but lacks specific hyperparameter values (e.g., learning rate, number of epochs, optimizer settings) or detailed configurations for the training process. For the CLIP ViT models, it even states they used "randomly selected hyperparameters." |