reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes

Authors: Yishi Xu, Jianqiao Sun, Yudi Su, Xinyang Liu, Zhibin Duan, Bo Chen, Mingyuan Zhou

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have conducted a wealth of quantitative and qualitative experiments, and the results show that our approach comprehensively outperforms established topic models.
Researcher Affiliation	Academia	Yishi Xu , Jianqiao Sun , Yudi Su, Xinyang Liu, Zhibin Duan, Bo Chen , National Key Laboratory of Radar Signal Processing, Xidian University, Xi an, China, 710071 EMAIL, EMAIL Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, TX 78712, USA EMAIL
Pseudocode	Yes	In Alg. 1 and Alg. 2, we present the training and meta-testing procedures of our Meta-CETM.
Open Source Code	Yes	Our code is available at https://github.com/Novice Stone/Meta-CETM.
Open Datasets	Yes	We conducted experiments on four widely used textual benchmark datasets, specifically 20Newsgroups (20NG) [38], Yahoo Answers Topics (Yahoo) [39], DBpedia (DB14) [40], and Web of Science (WOS) [41].
Dataset Splits	No	The paper describes a support set and a query set for each task (80%/20% split) but does not explicitly mention a separate validation set.
Hardware Specification	Yes	Finally, We train our model using the Adam optimizer [48] with a learning rate of 1 10 2 for 10 epochs on an NVIDIA Ge Force RTX 3090 graphics card.
Software Dependencies	No	The paper mentions 'spa Cy' and 'gensim package' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For all compared methods, we set the number of topics as 10. And for all NTMs, the hidden layers size of the encoder is set to 300. For all embedding-based topic models, i.e., ETM, MAML-ETM, Meta-Saw ETM and our Meta-CETM, we load pretrained Glo Ve word embeddings [47] as the initialization for a fair comparison. Finally, We train our model using the Adam optimizer [48] with a learning rate of 1 10 2 for 10 epochs on an NVIDIA Ge Force RTX 3090 graphics card.