Contrastive Learning with Simplicial Convolutional Networks for Short-Text Classification
Authors: Huang Liang, Benedict Lee, Daniel Hui Loong Ng, Kelin Xia
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on four benchmark datasets demonstrate the capability of C-SCN to outperform existing models in analysing sequential and complex short-text data. Section 4 introduces four short text classification task datasets from various domains used in the experiments. Section 5 presents the performance metrics compared with other models and ablation studies. |
| Researcher Affiliation | Collaboration | 1Nanyang Technological University, Singapore 2HP Inc, Singapore. Correspondence to: Liang Huang <EMAIL>, Kelin Xia <EMAIL>. |
| Pseudocode | Yes | We include the pseudo-code for C-SCN to enhance the reproducibility in Algorithm 1. Algorithm 1 Algorithm Pseudo Code for C-SCN. |
| Open Source Code | No | The paper includes 'Algorithm 1 Algorithm Pseudo Code for C-SCN' which is pseudocode, not executable source code. The footnote '1https://pytorch-geometric.readthedocs.io/en/latest/index.html' refers to a third-party library used in the implementation, not the authors' specific source code for their proposed model. There is no explicit statement or link providing access to the source code for the methodology described in this paper. |
| Open Datasets | Yes | The experiments are conducted on four datasets for short text classification tasks. The datasets are briefly introduced below, and a summary table is reported in Table 1. ... Twitter (Bird et al., 2009) ... MR (Pang & Lee, 2005) ... Snippets (Phan et al., 2008) ... Stack Overflow (Hamner et al., 2012) |
| Dataset Splits | Yes | Following with few-shot setting for short text classification framework ((Sun et al., 2022; Wen & Fang, 2023; Liu et al., 2024)), from each category, 20 samples are selected randomly to form the train set, another 20 samples are selected randomly to form the validation set, and the rest are included in the unseen test set. |
| Hardware Specification | Yes | The experiments are conducted ten times with NVIDIA RTX A6000 with 48GB of memory. |
| Software Dependencies | No | The model is trained with the PyTorch Geometric1 package for 100 epochs with early stopping... (footnote 1 refers to 'https://pytorch-geometric.readthedocs.io/en/latest/index.html'). The paper mentions a key software package 'PyTorch Geometric' but does not specify a version number. |
| Experiment Setup | Yes | The embedding matrices for 1-simplexes and 2-simplexes are randomly initialised and optimised to size 128. The learning rate is 1e-4, and the batch size is 128. A dropout rate of 50% is implemented to reduce the complexity of the model and prevent overfitting problems. The model is trained with the PyTorch Geometric1 package for 100 epochs with early stopping where the validation loss does not improve for ten epochs. The best weights are obtained from the model with the best validation accuracy. Cross-entropy loss is used with an Adam optimiser. ... In our experiments, a grid search is conducted for the best performance for the best η values. |