Algorithms and Hardness for Active Learning on Graphs

Authors: Vincent Cohen-Addad, Silvio Lattanzi, Simon Meierhans

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results with proof-of-concept experiments on both synthetic and real world graphs showing that our algorithm has better experimental performances than the heuristics presented in (Guillory & Bilmes, 2009; Cesa-Bianchi et al., 2010).
Researcher Affiliation Collaboration 1Google 2ETH Zurich, Switzerland.
Pseudocode Yes Algorithm 1 T-GLS(G, τ, k)
Open Source Code Yes We provide the code used for running the experiments under https://github.com/mesimon/graph_label_selection.
Open Datasets Yes We then explore the full trade-off between budget k and Ψ(L) for the Davis southern woman graph (Davis et al., 1941). Finally, we run experiments on two real world graphs from the Stanford Network Analysis Project (SNAP) (Leskovec & Krevl, 2014).
Dataset Splits No The paper focuses on an active learning problem where the goal is to select 'k' vertices whose labels are best suited for predicting other labels. The concept of train/test/validation splits in the traditional sense for model training is not applicable here as the datasets are graphs where vertices are selected, not partitioned into predefined training, testing, or validation sets for a supervised learning task. The paper describes using a 'budget k' to select vertices, which is not a dataset split.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that would be needed to reproduce the experimental environment.
Experiment Setup Yes For all experiments, we run the algorithm of (Guillory & Bilmes, 2009) for maximizing Ψ( ) and the algorithm of (Cesa-Bianchi et al., 2010) 10 times and report the mean and standard deviation for the achieved ratio Ψ(L). Since our algorithm is deterministic, we simply report the ratio. We don t allow any slack in the budget k when evaluating the algorithms.