reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Geometry of Lightning Self-Attention: Identifiability and Dimension

Authors: Nathan Henry, Giovanni Luca Marchetti, Kathlén Kohn

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evidence for Conjecture 3.10. To this end, we implement a deep attention network with softmax normalization (i.e., S(x) = ex), and estimate the dimension of its neuromanifold. The results are visualized in Figure 3 for a deep attention network with l = 2 layers, t = 3, ai = 2 for all i, and di = δ varying from 3 to 10. The plot shows both the dimension estimated via the numerical approach ( Estimated ) and the one computed via Equation 16 ( Expected ). The two values coincide for all δ, confirming Conjecture 3.10 empirically.
Researcher Affiliation	Academia	Nathan W. Henry * University of Toronto EMAIL Giovanni Luca Marchetti * Royal Institute of Technology (KTH) EMAIL Kathl en Kohn * Royal Institute of Technology (KTH) EMAIL
Pseudocode	No	The paper describes mathematical proofs and derivations using equations and prose, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our Python code is available at a public repository4. https://github.com/giovanni-marchetti/NeuroDim
Open Datasets	No	To this end, we implement a deep attention network with softmax normalization (i.e., S(x) = ex), and estimate the dimension of its neuromanifold. The latter is a subtle problem since, differently from the lightning case, the neuromanifold is not a priori embedded in a finite-dimensional vector space. Therefore, we rely on a stochastic finite element approach by randomly sampling N = 250 input points in Rd0 t from a normal distribution and restricting φW to this finite space.
Dataset Splits	No	Therefore, we rely on a stochastic finite element approach by randomly sampling N = 250 input points in Rd0 t from a normal distribution and restricting φW to this finite space. This text describes the generation of synthetic data, not the splitting of an existing dataset into training, validation, or test sets.
Hardware Specification	No	The paper does not mention any specific hardware (e.g., CPU, GPU models, or cloud computing resources) used for running the numerical verifications.
Software Dependencies	No	The paper mentions 'Our Python code is available at a public repository' but does not specify any particular software versions (e.g., Python version, library versions like PyTorch, TensorFlow, etc.) used in the implementation.
Experiment Setup	Yes	The results are visualized in Figure 3 for a deep attention network with l = 2 layers, t = 3, ai = 2 for all i, and di = δ varying from 3 to 10.