reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Bag-of-Ngrams

Authors: Bofang Li, Tao Liu, Zhe Zhao, Puwei Wang, Xiaoyong Du

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform qualitative evaluation on IMDB dataset (Table 2), and quantitative evaluation on text classiﬁcation task (7 datasets) and semantic relatedness task (2 datasets with 7 categories).
Researcher Affiliation	Academia	Bofang Li, Tao Liu, Zhe Zhao, Puwei Wang, Xiaoyong Du School of Information, Renmin University of China, Beijing, China Key laboratory of Data Engineering and Knowledge Engineering, MOE, Beijing, China EMAIL
Pseudocode	No	The paper describes methods textually and mathematically but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code of Neural-Bo N is published at https://github. com/libofang/Neural-Bo N.
Open Datasets	Yes	For text classiﬁcation task, hyper-parameters are tuned on 20% of the training data from IMDB dataset (Maas et al. 2011). For semantic relatedness task, hyper-parameters are tuned on the development data from SICK dataset (Marelli et al. 2014). Similar to previous researches, Toronto Books Corpus is used as training data.
Dataset Splits	Yes	For text classiﬁcation task, hyper-parameters are tuned on 20% of the training data from IMDB dataset (Maas et al. 2011). For semantic relatedness task, hyper-parameters are tuned on the development data from SICK dataset (Marelli et al. 2014).
Hardware Specification	Yes	Table 3: Approximate training time of models for a single epoch on one million words. CPU: Intel Xeon E5-2670 (32core). GPU: NVIDIA Tesla K40.
Software Dependencies	No	The paper mentions techniques like 'Negative Sampling', 'stochastic gradient descent', and 'backpropagation', but does not list any specific software or library names with version numbers used for implementation.
Experiment Setup	Yes	Optimal hyper-parameters are actually identical: the vector dimension is 500, the learning rate is ﬁxed to 0.25, the negative sampling size is 5, and models are trained for 10 iteration.