reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Supervised Word Mover's Distance

Authors: Gao Huang, Chuan Guo, Matt J. Kusner, Yu Sun, Fei Sha, Kilian Q. Weinberger

NeurIPS 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate S-WMD on eight real-world text classiﬁcation tasks on which it consistently outperforms almost all of our 26 competitive baselines. We evaluate all approaches on 8 document datasets in the settings of news categorization, sentiment analysis, and product identiﬁcation, among others. Table 1 describes the classiﬁcation tasks as well as the size and number of classes C of each of the datasets. We evaluate against the following document representation/distance methods: ... Table 2: The k NN test error for all datasets and distances.
Researcher Affiliation	Academia	Gao Huang , Chuan Guo Cornell University EMAIL Matt J. Kusner Alan Turing Institute, University of Warwick EMAIL Yu Sun, Kilian Q. Weinberger Cornell University EMAIL Fei Sha University of California, Los Angeles EMAIL
Pseudocode	Yes	Algorithm 1 S-WMD
Open Source Code	Yes	Our code is implemented in Matlab and is freely available at https://github.com/gaohuang/S-WMD.
Open Datasets	Yes	We evaluate S-WMD on 8 different document corpora... Table 1: The document datasets (and their descriptions) used for visualization and evaluation. ... REUTERS news dataset (train/test split [3]) ... TWITTER tweets categorized by sentiment [31] ... 20NEWS canonical news article dataset [3]
Dataset Splits	No	For datasets that do not have a predeﬁned train/test split: BBCSPORT, TWITTER, RECIPE, CLASSIC, and AMAZON we average results over ﬁve 70/30 train/test splits and report standard errors. The paper does not explicitly state training/validation/test splits or mention a specific validation split percentage for the data.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	Our code is implemented in Matlab. The paper does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	In our experiments, we use λ = 10, which leads to a nice trade-off between speed and approximation accuracy. In our experiments, we set B = 32 and N = 200, and computing the gradient at each iteration can be done in seconds.