reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AutoG: Towards automatic graph construction from tabular data

Authors: Zhikai Chen, Han Xie, Jian Zhang, Xiang song, Jiliang Tang, Huzefa Rangwala, George Karypis

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results demonstrate that the quality of constructed graphs is critical to downstream task performance, and Auto G can generate high-quality graphs that rival those produced by human experts. Our code can be accessible from https://github.com/amazon-science/ Automatic-Table-to-Graph-Generation.
Researcher Affiliation	Collaboration	Zhikai Chen1, Han Xie2, Jian Zhang2, Xiang Song2, Jiliang Tang1, Huzefa Rangwala2, George Karypis2 1Michigan State University 2Amazon
Pseudocode	No	The paper describes methods and processes verbally and with a diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks. The prompt demonstrations in Appendix D.1.2 are examples of prompts, not a formal algorithm.
Open Source Code	Yes	Our code can be accessible from https://github.com/amazon-science/Automatic-Table-to-Graph-Generation.
Open Datasets	Yes	We first extract raw tabular datasets from Kaggle, Codalab, and other data sources to design datasets reflecting real-world graph construction challenges. We collect these datasets from 1. the source of existing tabular graph datasets, such as Outbrain (Wang et al., 2024c); 2. augmented from existing tabular graph datasets, such as Stackexchange (Wang et al., 2024c); 3. traditional tabular datasets adapted for graph construction, including IEEE-CIS (Howard et al., 2019) and Movielens (Harper & Konstan, 2015).
Dataset Splits	No	The paper mentions using 'early-stage validation performance' and 'training and testing on a smaller graph' for its oracle design, but it does not explicitly provide the train/test/validation split percentages, sample counts, or a detailed splitting methodology for the datasets used in its experiments within the main text or appendices.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions selecting 'Claude’s Sonnet-3.5 as the backbone of LLMs' and refers to GML models like RGCN, RGAT, HGT, and PNA. However, it does not provide specific version numbers for these GML models or the underlying deep learning frameworks and libraries used, which are necessary for reproducible software dependencies.
Experiment Setup	No	The paper describes the general experimental framework and evaluation methods (e.g., using RGCN for performance, different oracle designs). However, it does not explicitly state specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed system-level training configurations for the GML models used in its experiments within the main text or provided appendices.