Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes
Authors: Yishi Xu, Jianqiao Sun, Yudi Su, Xinyang Liu, Zhibin Duan, Bo Chen, Mingyuan Zhou
NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have conducted a wealth of quantitative and qualitative experiments, and the results show that our approach comprehensively outperforms established topic models. |
| Researcher Affiliation | Academia | Yishi Xu , Jianqiao Sun , Yudi Su, Xinyang Liu, Zhibin Duan, Bo Chen , National Key Laboratory of Radar Signal Processing, Xidian University, Xi an, China, 710071 EMAIL, EMAIL Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, TX 78712, USA EMAIL |
| Pseudocode | Yes | In Alg. 1 and Alg. 2, we present the training and meta-testing procedures of our Meta-CETM. |
| Open Source Code | Yes | Our code is available at https://github.com/Novice Stone/Meta-CETM. |
| Open Datasets | Yes | We conducted experiments on four widely used textual benchmark datasets, specifically 20Newsgroups (20NG) [38], Yahoo Answers Topics (Yahoo) [39], DBpedia (DB14) [40], and Web of Science (WOS) [41]. |
| Dataset Splits | No | The paper describes a support set and a query set for each task (80%/20% split) but does not explicitly mention a separate validation set. |
| Hardware Specification | Yes | Finally, We train our model using the Adam optimizer [48] with a learning rate of 1 10 2 for 10 epochs on an NVIDIA Ge Force RTX 3090 graphics card. |
| Software Dependencies | No | The paper mentions 'spa Cy' and 'gensim package' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For all compared methods, we set the number of topics as 10. And for all NTMs, the hidden layers size of the encoder is set to 300. For all embedding-based topic models, i.e., ETM, MAML-ETM, Meta-Saw ETM and our Meta-CETM, we load pretrained Glo Ve word embeddings [47] as the initialization for a fair comparison. Finally, We train our model using the Adam optimizer [48] with a learning rate of 1 10 2 for 10 epochs on an NVIDIA Ge Force RTX 3090 graphics card. |