reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

Authors: Marvin F. Da Silva, Felix Dangel, Sageev Oore

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present results on diagonal nets with synthetic data and show that our geodesic sharpness reveals strong correlation with generalization for real-world transformers on both text and image classification tasks. Sections 5.1.1, 5.3.1, and 5.3.2 are explicitly labeled "EMPIRICAL VALIDATION" for different models and datasets.
Researcher Affiliation	Collaboration	The authors are affiliated with "Dalhousie University, Halifax, Canada" (an academic institution) and "Vector Insitute for Artificial Intelligence, Toronto, Canada" (a research institute often associated with industry), indicating a collaboration.
Pseudocode	Yes	In Appendix C.2, the paper presents "Algorithm 1 Auto-PGD" which is a structured block of pseudocode for their method.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	The paper uses well-known public datasets, citing them appropriately: "fine-tuning CLIP on Image Net-1k (Radford et al., 2021)" and "BERT models that were fine-tuned on MNLI (Williams et al., 2018)."
Dataset Splits	Yes	The paper mentions using specific parts of standard datasets, implying their well-defined splits: "Image Net training set, divided into batches of 256" and referring to the "MNLI dev matched set (Williams et al., 2018)."
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general funding sources like NSERC, CIFAR, and the Vector Institute.
Software Dependencies	No	The paper mentions using Auto-PGD (Croce & Hein, 2020) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	No	The paper mentions general settings such as "batches of 256" for ImageNet and "batches of 128 points" for MNLI, and that models were "fine-tuned," but lacks specific hyperparameter values (e.g., learning rate, number of epochs, optimizer settings) or detailed configurations for the training process. For the CLIP ViT models, it even states they used "randomly selected hyperparameters."