Weisfeiler and Leman Go Gambling: Why Expressive Lottery Tickets Win

Authors: Lorenz Kummer, Samir Moustafa, Anatol Ehrlich, Franka Bause, Nikolaus Suess, Wilfried N. Gansterer, Nils Morten Kriege

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our work contributes to closing this gap by providing both formal and empirical evidence that preserving expressivity in sparsely initialized GNNs is crucial for finding winning tickets. We structure our experiments to address the primary research question, which also drove our theoretical analysis, namely how the pre-training expressivity of a lottery ticket affects its post-training accuracy. We investigate this research question by utilizing 10 real-world datasets from the TUDataset repository (Morris et al., 2020).
Researcher Affiliation Academia 1Faculty of Computer Science, University of Vienna, Vienna, Austria 2Doctoral School Computer Science, University of Vienna, Vienna, Austria 3Research Network Data Science, University of Vienna, Vienna, Austria. Correspondence to: Lorenz Kummer <EMAIL>.
Pseudocode No No specific pseudocode block or algorithm section is present. The methodology is described in prose and mathematical equations.
Open Source Code Yes The code for reproducing our results is available at Git Hub: https://github.com/lorenz0890/wl2025lottery
Open Datasets Yes We investigate this research question by utilizing 10 real-world datasets from the TUDataset repository (Morris et al., 2020). These datasets, which are described in detail in Appendix B, are widely used in current studies.
Dataset Splits No No specific details about training, validation, and test splits (e.g., percentages, counts, or k-fold cross-validation) are explicitly mentioned in the paper for the datasets used. The paper states, "Results aggregated from training 13,500 runs over 10 datasets," but does not elaborate on the splitting strategy for individual datasets.
Hardware Specification Yes The experiments took approximately 8 weeks with three parallel workers to conclude and were conducted on a local server equipped with an NVIDIA H100 PCIe GPU (80GB VRAM), an Intel Xeon Gold 6326 CPU (500GB RAM) and a 1TB SSD.
Software Dependencies No No specific versions for software libraries such as PyTorch, TensorFlow, Python, or CUDA are mentioned. Only general components like 'ReLU activations' and 'Adam optimizer' are referred to without version numbers.
Experiment Setup Yes All models are trained for 250 epochs with a batch size of 32, a learning rate of 0.01, using the Adam optimizer. In line with LTH, only non-zero weights are updated. We use Re LU activations and initialize network parameters W(j) randomly from a uniform distribution U( q 1 mj ) with mj = |I(j)|, following common variance scaling initialization schemes (Glorot & Bengio, 2010; He et al., 2015).