reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Effective String Processing and Matching for Author Disambiguation

Authors: Wei-Sheng Chin, Yong Zhuang, Yu-Chin Juan, Felix Wu, Hsiao-Yu Tung, Tong Yu, Jui-Pin Wang, Cheng-Xia Chang, Chun-Pai Yang, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Yu-Chuan Su, Cheng-Kuang Wei, Tu-Chun Yin, Chun-Liang Li, Ting-Wei Lin, Cheng-Hao Tsai, Shou-De Lin, Hsuan-Tien Lin, Chih-Jen Lin

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board. ... The evaluation measure is the average of F1-scores over all authors in Author.csv. ... In our experiments, the used platform includes 96GB memory and two Intel Xeon E5-2620 2.0 Hz processors of which each has 6 physical cores.
Researcher Affiliation	Academia	Wei-Sheng Chin EMAIL ... Chih-Jen Lin EMAIL Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan
Pseudocode	Yes	Algorithm 1: The main procedure in the identification stage of the first implementation. ... Algorithm 2: Ensemble of two results.
Open Source Code	Yes	Besides, our implementations are available at https://github.com/kdd-cup-2013-ntu/track2.
Open Datasets	Yes	The data set is offered by Microsoft Academic Search (MAS). As a search engine, MAS integrates information of authors and their publications from different sources. ... The organizers of KDD Cup 2013 release seven files, Author.csv, Paper Author.csv, Conference.csv, Journal.csv, Paper.csv, Train.csv, and Valid.csv. ... Other details of data sets and the competition can be found in Roy et al. (2013).
Dataset Splits	Yes	In the competition, 20% of authors in Author.csv are used to evaluate duplicates submitted by participants. We refer to them as results on the public leader board. For the final evaluation, the remaining 80% authors in Author.csv are used and the F1-scores are called results on the private leader board.
Hardware Specification	Yes	In our experiments, the used platform includes 96GB memory and two Intel Xeon E5-2620 2.0 Hz processors of which each has 6 physical cores.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, etc.) that are needed to replicate the experiment.
Experiment Setup	No	The paper describes its methodology in detail, including stages for cleaning, selection, identification, and splitting. However, it does not provide specific numerical hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or system-level training configurations typically found in experimental setups for machine learning models.