Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Neural Bag-of-Ngrams
Authors: Bofang Li, Tao Liu, Zhe Zhao, Puwei Wang, Xiaoyong Du
AAAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform qualitative evaluation on IMDB dataset (Table 2), and quantitative evaluation on text classification task (7 datasets) and semantic relatedness task (2 datasets with 7 categories). |
| Researcher Affiliation | Academia | Bofang Li, Tao Liu, Zhe Zhao, Puwei Wang, Xiaoyong Du School of Information, Renmin University of China, Beijing, China Key laboratory of Data Engineering and Knowledge Engineering, MOE, Beijing, China EMAIL |
| Pseudocode | No | The paper describes methods textually and mathematically but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of Neural-Bo N is published at https://github. com/libofang/Neural-Bo N. |
| Open Datasets | Yes | For text classification task, hyper-parameters are tuned on 20% of the training data from IMDB dataset (Maas et al. 2011). For semantic relatedness task, hyper-parameters are tuned on the development data from SICK dataset (Marelli et al. 2014). Similar to previous researches, Toronto Books Corpus is used as training data. |
| Dataset Splits | Yes | For text classification task, hyper-parameters are tuned on 20% of the training data from IMDB dataset (Maas et al. 2011). For semantic relatedness task, hyper-parameters are tuned on the development data from SICK dataset (Marelli et al. 2014). |
| Hardware Specification | Yes | Table 3: Approximate training time of models for a single epoch on one million words. CPU: Intel Xeon E5-2670 (32core). GPU: NVIDIA Tesla K40. |
| Software Dependencies | No | The paper mentions techniques like 'Negative Sampling', 'stochastic gradient descent', and 'backpropagation', but does not list any specific software or library names with version numbers used for implementation. |
| Experiment Setup | Yes | Optimal hyper-parameters are actually identical: the vector dimension is 500, the learning rate is fixed to 0.25, the negative sampling size is 5, and models are trained for 10 iteration. |