reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Node Classification With Reject Option

Authors: Uday Bhaskar Kuchipudi, Jayadratha Gayen, Charu Sharma, Naresh Manwani

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments using our method on standard citation network datasets Cora, Cite Seer, Pub Med and ogbn-arxiv. We also model the Legal judgment prediction problem on the ILDC dataset as a node classification problem, where nodes represent legal cases and edges represent citations. We further interpret the model by analyzing the cases in which it abstains from predicting and visualizing which part of the input features influenced this decision.
Researcher Affiliation	Academia	Uday Bhaskar Kuchipudi EMAIL Machine Learning Lab, IIIT Hyderabad, India
Pseudocode	No	The paper describes the architectures NCw R-Cov and NCw R-Cost using diagrams (Figure 1 and Figure 2) and mathematical formulations, but it does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "We modify the open-source GAT implementation by Antognini (2021) for our approach." and cites a GitHub link for that base GAT implementation. However, it does not provide concrete access to the authors' own implementation of NCwR or any specific code for the methodology described in the paper.
Open Datasets	Yes	We evaluate our model on three standard citation network datasets, Cora, Cite Seer, Pub Med (Sen et al., 2008) and an OGB Dataset ogbn-arxiv Hu et al. (2020). Indian Legal Documents Corpus (ILDC) (Malik et al., 2021) UCI Thyroid Quinlan (1986) dataset and Pima Indians Diabetes dataset Smith et al. (1988).
Dataset Splits	Yes	We use 20 nodes per class for training, 500 nodes for validation and 1000 for testing on the three Planetoid datasets and the given splits in OGB dataset. This subset contains a total of 7,593 cases, split into train/test/development sets (5,082/1,517/994). Approximately 10% of the training data was reserved as a validation set, ensuring a fair evaluation of the model s performance during training. The graph was constructed by concatenating the training (85%) and test (15%) sets, treating them as a unified structure to facilitate transductive learning.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies	No	The paper mentions modifying an "open-source GAT implementation" but does not list specific software dependencies (e.g., library names with version numbers like PyTorch 1.x, Python 3.x, CUDA x.x) used for their own work.
Experiment Setup	Yes	We first apply dropout (Srivastava et al., 2014) on node features with p = 0.6. These node features, along with the adjacency matrix, are passed through a GAT Layer with 8 attention heads, where each head produces eight features per node. We use Leaky Re LU as the activation function inside the GAT Layer with α = 0.2. Another dropout layer with the same probability follows this. We set λ = 32 as the constraint on coverage to calculate this loss. Cross-Entropy Loss is performed on the output of the Auxiliary head but is not used for making predictions. A convex combination of these two loss values with αl = 0.5 is used for backpropagation. We follow the experimental setup presented in Khatri et al. (2023) and use a pretrained XLNet model from Malik et al. (2021) to extract language embeddings from all the new cases in the dataset.