HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding
Authors: Yi.shi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou
NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51]. The statistics of these datasets are presented in Table 1. |
| Researcher Affiliation | Academia | Yishi Xu, Dongsheng Wang, Bo Chen , Ruiying Lu, Zhibin Duan National Laboratory of Radar Signal Processing, Xidian University, Xi an, China EMAIL, EMAIL Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Knowledge-Guided Topic Taxonomy Mining Input: mini-batch size B, number of layers T, adjacent matrix A built from concept taxonomy. Initialize the variational network parameters Ωand the word and topic embeddings {α(l)}L l=0; while not converged do... |
| Open Source Code | Yes | Our code is available at https://github.com/NoviceStone/HyperMiner |
| Open Datasets | Yes | We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51]. |
| Dataset Splits | No | Concretely, with the default training/test split of each dataset, we first train a topic model on the training set, and then the trained model is used to extract features θ of all test documents. |
| Hardware Specification | No | The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]' in the checklist, but the main text does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud provider instances used for experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies beyond a general mention of Python 3.8 in the author checklist, and no other key libraries or solvers with their versions are specified in the main text. |
| Experiment Setup | Yes | The embedding dimension for embedded topic models is set as 50. ... τ is the temperature parameter. ... λ is the hyper-parameter used to control the impact of the regularization term... Input: mini-batch size B... |