reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding

Authors: Yi.shi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51]. The statistics of these datasets are presented in Table 1.
Researcher Affiliation	Academia	Yishi Xu, Dongsheng Wang, Bo Chen , Ruiying Lu, Zhibin Duan National Laboratory of Radar Signal Processing, Xidian University, Xi an, China EMAIL, EMAIL Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, USA EMAIL
Pseudocode	Yes	Algorithm 1 Knowledge-Guided Topic Taxonomy Mining Input: mini-batch size B, number of layers T, adjacent matrix A built from concept taxonomy. Initialize the variational network parameters Ωand the word and topic embeddings {α(l)}L l=0; while not converged do...
Open Source Code	Yes	Our code is available at https://github.com/NoviceStone/HyperMiner
Open Datasets	Yes	We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51].
Dataset Splits	No	Concretely, with the default training/test split of each dataset, we first train a topic model on the training set, and then the trained model is used to extract features θ of all test documents.
Hardware Specification	No	The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]' in the checklist, but the main text does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud provider instances used for experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies beyond a general mention of Python 3.8 in the author checklist, and no other key libraries or solvers with their versions are specified in the main text.
Experiment Setup	Yes	The embedding dimension for embedded topic models is set as 50. ... τ is the temperature parameter. ... λ is the hyper-parameter used to control the impact of the regularization term... Input: mini-batch size B...