reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data

Authors: Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianping Fan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Motivated by these observations, we embarked on an empirical study from the perspective of attention mechanisms to explore how LLMs process graph-structured data. The goal is to gain deeper insights into the attention behavior of LLMs over graph structures. We uncovered unique phenomena regarding how LLMs apply attention to graph-structured data and analyzed these findings to improve the modeling of such data by LLMs.
Researcher Affiliation	Collaboration	1 College of Management and Economics, Tianjin University, Tianjin, China 2 Laboratory of Computation and Analytics of Complex Management Systems ,Tianjin University,Tianjin, China 3ai-deepcube 4AI Lab at Lenovo Research, Beijing, China. Correspondence to: Hongke Zhao <EMAIL>.
Pseudocode	No	The paper describes methodologies in prose but does not include any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Source code: LLM4Exploration
Open Datasets	Yes	When selecting datasets, we considered a broad spectrum and chose heterogeneous graph datasets Amazon-Rating (Platonov et al., 2023) and Roman-Empire (Platonov et al., 2023), as well as homogeneous graph datasets Pubmed and Wiki CS (Mernyei & Cangea, 2020).
Dataset Splits	No	The paper mentions using standard benchmark datasets but does not explicitly provide the training, validation, and test splits (e.g., percentages, sample counts, or references to predefined splits) used for their experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using "LLama2-7B (Touvron et al., 2023) as our base model" but does not specify versions for any other ancillary software components like programming languages or libraries.
Experiment Setup	Yes	Table 5. Hyperparameters for the Roman-Empire, Amazon-Ratings, and Pubmed, Wikics. learning rate 1e-4 warmup 0.05 gradient accumulation steps 8 batch size 4 epoch 1 num beams 2 use embedding Ture True False False