reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Generalized Skew Spectrum of Graphs

Authors: Armando Bellante, Martin Plávala, Alessandro Luongo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate our theoretical contributions with numerical experiments, demonstrating that our generalizations significantly improve the Skew Spectrum expressivity: distinguishing richer graphs, and distinguishing more non-isomorphic simple graphs at the same computational complexity. 7. Numerical experiments
Researcher Affiliation	Collaboration	1Max-Planck-Institut f ur Quantenoptik, Hans-Kopfermann-Str. 1, 85748 Garching, Germany ... 6Centre for Quantum Technologies, National University of Singapore, Singapore 7Inveriant Pte. Ltd., Singapore.
Pseudocode	Yes	Algorithm 1 Doubly-Reduced k-Spectrum Algorithm 2 Precomputing s(k)
Open Source Code	No	On the practical side, (1) developing a scalable, optimized open-source implementation with thorough benchmarking is a crucial step toward real-world adoption.
Open Datasets	Yes	We tested the Multi-Orbit generalization on the QM7 dataset, which contains Coulomb matrices of 7,165 molecules with up to 23 atoms (Rupp et al., 2012; Blum & Reymond, 2009). ... we extended the multi-orbit experiments on QM7 presented in Section 7.1 (Table 1) to two larger molecular datasets: QM9 and ZINC. Both datasets were loaded through torch.geometric. ... using two datasets of non-isomorphic, unweighted, undirected graphs: the Atlas of all graphs with 7 nodes, and a set of connected chordal graphs with 8 nodes (Mc Kay).
Dataset Splits	Yes	We train several models on a 80%-20% split: Extreme Gradient Boosting (XGB), Gradient Boosting Regressor (GBR), Elastic Net (EN), Linear Regression (Linear) (Pedregosa et al., 2011). ... We used the first 100,000 molecules for training and the remaining 30,831 for testing. ZINC contains 249,456 molecules with up to 38 nodes. We used the default train/test/validation split from torch.geometric, training on 220,011 molecules and testing on 5,000, ignoring the validation set for simplicity. ... We experiment with two dropout rates: the original 0.5 and a lower value of 0.2.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions "scikit-learn" (Pedregosa et al., 2011) and "torch.geometric" but does not specify version numbers for these or other key software components used in their implementation.
Experiment Setup	Yes	A Random Forest classifier (60 estimators, no max depth) ... We trained multiple regression models on these representations: Extreme Gradient Boosting (XGB), Gradient Boosting Regressor (GBR), Elastic Net (EN), Linear Regression (Linear) (Pedregosa et al., 2011). ... including a learning rate of 0.001, a batch size of 32, and a maximum of 1000 training epochs. Early stopping is applied based on validation loss. We experiment with two dropout rates: the original 0.5 and a lower value of 0.2.