AutoG: Towards automatic graph construction from tabular data
Authors: Zhikai Chen, Han Xie, Jian Zhang, Xiang song, Jiliang Tang, Huzefa Rangwala, George Karypis
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results demonstrate that the quality of constructed graphs is critical to downstream task performance, and Auto G can generate high-quality graphs that rival those produced by human experts. Our code can be accessible from https://github.com/amazon-science/ Automatic-Table-to-Graph-Generation. |
| Researcher Affiliation | Collaboration | Zhikai Chen1, Han Xie2, Jian Zhang2, Xiang Song2, Jiliang Tang1, Huzefa Rangwala2, George Karypis2 1Michigan State University 2Amazon |
| Pseudocode | No | The paper describes methods and processes verbally and with a diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks. The prompt demonstrations in Appendix D.1.2 are examples of prompts, not a formal algorithm. |
| Open Source Code | Yes | Our code can be accessible from https://github.com/amazon-science/Automatic-Table-to-Graph-Generation. |
| Open Datasets | Yes | We first extract raw tabular datasets from Kaggle, Codalab, and other data sources to design datasets reflecting real-world graph construction challenges. We collect these datasets from 1. the source of existing tabular graph datasets, such as Outbrain (Wang et al., 2024c); 2. augmented from existing tabular graph datasets, such as Stackexchange (Wang et al., 2024c); 3. traditional tabular datasets adapted for graph construction, including IEEE-CIS (Howard et al., 2019) and Movielens (Harper & Konstan, 2015). |
| Dataset Splits | No | The paper mentions using 'early-stage validation performance' and 'training and testing on a smaller graph' for its oracle design, but it does not explicitly provide the train/test/validation split percentages, sample counts, or a detailed splitting methodology for the datasets used in its experiments within the main text or appendices. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions selecting 'Claude’s Sonnet-3.5 as the backbone of LLMs' and refers to GML models like RGCN, RGAT, HGT, and PNA. However, it does not provide specific version numbers for these GML models or the underlying deep learning frameworks and libraries used, which are necessary for reproducible software dependencies. |
| Experiment Setup | No | The paper describes the general experimental framework and evaluation methods (e.g., using RGCN for performance, different oracle designs). However, it does not explicitly state specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed system-level training configurations for the GML models used in its experiments within the main text or provided appendices. |