reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contrastive Learning with Simplicial Convolutional Networks for Short-Text Classification

Authors: Huang Liang, Benedict Lee, Daniel Hui Loong Ng, Kelin Xia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on four benchmark datasets demonstrate the capability of C-SCN to outperform existing models in analysing sequential and complex short-text data. Section 4 introduces four short text classification task datasets from various domains used in the experiments. Section 5 presents the performance metrics compared with other models and ablation studies.
Researcher Affiliation	Collaboration	1Nanyang Technological University, Singapore 2HP Inc, Singapore. Correspondence to: Liang Huang <EMAIL>, Kelin Xia <EMAIL>.
Pseudocode	Yes	We include the pseudo-code for C-SCN to enhance the reproducibility in Algorithm 1. Algorithm 1 Algorithm Pseudo Code for C-SCN.
Open Source Code	No	The paper includes 'Algorithm 1 Algorithm Pseudo Code for C-SCN' which is pseudocode, not executable source code. The footnote '1https://pytorch-geometric.readthedocs.io/en/latest/index.html' refers to a third-party library used in the implementation, not the authors' specific source code for their proposed model. There is no explicit statement or link providing access to the source code for the methodology described in this paper.
Open Datasets	Yes	The experiments are conducted on four datasets for short text classification tasks. The datasets are briefly introduced below, and a summary table is reported in Table 1. ... Twitter (Bird et al., 2009) ... MR (Pang & Lee, 2005) ... Snippets (Phan et al., 2008) ... Stack Overflow (Hamner et al., 2012)
Dataset Splits	Yes	Following with few-shot setting for short text classification framework ((Sun et al., 2022; Wen & Fang, 2023; Liu et al., 2024)), from each category, 20 samples are selected randomly to form the train set, another 20 samples are selected randomly to form the validation set, and the rest are included in the unseen test set.
Hardware Specification	Yes	The experiments are conducted ten times with NVIDIA RTX A6000 with 48GB of memory.
Software Dependencies	No	The model is trained with the PyTorch Geometric1 package for 100 epochs with early stopping... (footnote 1 refers to 'https://pytorch-geometric.readthedocs.io/en/latest/index.html'). The paper mentions a key software package 'PyTorch Geometric' but does not specify a version number.
Experiment Setup	Yes	The embedding matrices for 1-simplexes and 2-simplexes are randomly initialised and optimised to size 128. The learning rate is 1e-4, and the batch size is 128. A dropout rate of 50% is implemented to reduce the complexity of the model and prevent overfitting problems. The model is trained with the PyTorch Geometric1 package for 100 epochs with early stopping where the validation loss does not improve for ten epochs. The best weights are obtained from the model with the best validation accuracy. Cross-entropy loss is used with an Adam optimiser. ... In our experiments, a grid search is conducted for the best performance for the best η values.