reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CPM-based Hierarchical Text Classification

Authors: Biqing Zeng, Yihao Peng, Jichen Yang, Peilin Hong, Junjie Liang

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To comprehensively evaluate the performance of our model, we conducted experiments and analysis on two datasets: Web-of-Science (WOS) and New York Times (NYT).
Researcher Affiliation	Academia	Biqing Zeng EMAIL Yihao Peng EMAIL School of Artificial Intelligence, South China Normal University, Foshan 528225 China Jichen Yang EMAIL School of Cyber Security, Guangdong Polytechnic University, Guangzhou 510225 China Peilin Hong EMAIL Junjie Liang EMAIL School of Artificial Intelligence, South China Normal University, Foshan 528225 China
Pseudocode	No	The paper describes the proposed method using textual explanations and mathematical formulas (equations 1-10) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about providing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	To comprehensively evaluate the performance of our model, we conducted experiments and analysis on two datasets: Web-of-Science (WOS) and New York Times (NYT). Both datasets are publicly available, with their original versions obtainable from the following sources: the WOS dataset can be accessed at https://github.com/kk7nc/HDLTex, and the NYT dataset is available at https://catalog.ldc.upenn.edu/LDC2008T19.
Dataset Splits	Yes	Dataset Depth level number Train (#) Val (#) Test (#) Label number WOS 2 30070 7518 9397 141 NYT 8 23345 5834 7292 166
Hardware Specification	Yes	We implemented our model in an end-to-end manner using Py Torch and conducted a series of experiments on an NVIDIA 3090 GPU.
Software Dependencies	No	We implemented our model in an end-to-end manner using Py Torch and conducted a series of experiments on an NVIDIA 3090 GPU. Following the approach of previous work (HPT), we used bert-base-uncased as our base architecture.
Experiment Setup	Yes	The batch size was set to 16, the optimizer was Adam, and the learning rate was 3e-5. Typically, for each parent class, we selected the top 4 extracted keywords to generate concepts. Unless we were specifically evaluating the impact of the number of selected keywords, we did not adjust any other hyperparameters in other experiments. During training, we set the stopping criteria to halt the training process if both Macro-F1 and Micro-F1 scores did not improve over 5 epochs.