GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation

Authors: Tao Feng, Yihang Sun, Jiaxuan You

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two datasets show Graph Eval improves F1 scores by at least 14% with low computation and API costs. Additionally, Graph Eval can effectively detect plagiarized ideas.
Researcher Affiliation Academia Tao Feng1*, Yihang Sun2*, Jiaxuan You1 1University of Illinois at Urbana-Champaign 2Peking University
Pseudocode Yes Algorithm 1 Training of Graph Eval Require: Dataset Dtrain = {(x, y)}. A weighted GNN fϕ. Edge weights wv. Number of GNN layers L.
Open Source Code Yes Our codes for Graph Eval is released at https://github.com/ulab-uiuc/Graph Eval.
Open Datasets Yes ICLR Papers: We collect abstracts and review decisions from paper submissions to the ICLR conferences between 2021 and 2023. From this, we randomly select 300 papers as the training set for learning-based methods and 50 papers as the test set. AI Researcher Dataset: We use the dataset collected by Si et al. (2024) in AI Researcher as an additional test set, which contains academic papers focusing on the domain of novel prompting methods.
Dataset Splits Yes ICLR Papers: From this, we randomly select 300 papers as the training set for learning-based methods and 50 papers as the test set. AI Researcher Dataset: For testing other methods, we split the dataset into training and testing sets in an 85%:15% ratio and conduct multiple experiments to average the results, thereby reducing bias. ASAP-Review dataset: We divided the dataset into training, validation, and test sets in the proportions of 70%, 10%, and 20%, respectively.
Hardware Specification Yes Our proposed method is implemented using Py Torch2 and Py Torch Geometric (Py G)3, with all experiments conducted on a single NVIDIA A100 Tensor Core GPU.
Software Dependencies No The paper mentions using "Py Torch" and "Py Torch Geometric (Py G)" but does not provide specific version numbers for these software components, nor for the Adam optimizer.
Experiment Setup Yes During the training phase, we configured the graph neural network as a two-layer weighted GNN with a hidden dimension of 64. The batch size is set to 64, and the maximum number of training epochs is limited to 1000. We employ the Adam optimizer (Diederik, 2014) for training and gradually reduce the learning rate from 1e-3 to 0 using a Lambda LR scheduler.