Effective String Processing and Matching for Author Disambiguation

Authors: Wei-Sheng Chin, Yong Zhuang, Yu-Chin Juan, Felix Wu, Hsiao-Yu Tung, Tong Yu, Jui-Pin Wang, Cheng-Xia Chang, Chun-Pai Yang, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Yu-Chuan Su, Cheng-Kuang Wei, Tu-Chun Yin, Chun-Liang Li, Ting-Wei Lin, Cheng-Hao Tsai, Shou-De Lin, Hsuan-Tien Lin, Chih-Jen Lin

JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board. ... The evaluation measure is the average of F1-scores over all authors in Author.csv. ... In our experiments, the used platform includes 96GB memory and two Intel Xeon E5-2620 2.0 Hz processors of which each has 6 physical cores.
Researcher Affiliation Academia Wei-Sheng Chin EMAIL ... Chih-Jen Lin EMAIL Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan
Pseudocode Yes Algorithm 1: The main procedure in the identification stage of the first implementation. ... Algorithm 2: Ensemble of two results.
Open Source Code Yes Besides, our implementations are available at https://github.com/kdd-cup-2013-ntu/track2.
Open Datasets Yes The data set is offered by Microsoft Academic Search (MAS). As a search engine, MAS integrates information of authors and their publications from different sources. ... The organizers of KDD Cup 2013 release seven files, Author.csv, Paper Author.csv, Conference.csv, Journal.csv, Paper.csv, Train.csv, and Valid.csv. ... Other details of data sets and the competition can be found in Roy et al. (2013).
Dataset Splits Yes In the competition, 20% of authors in Author.csv are used to evaluate duplicates submitted by participants. We refer to them as results on the public leader board. For the final evaluation, the remaining 80% authors in Author.csv are used and the F1-scores are called results on the private leader board.
Hardware Specification Yes In our experiments, the used platform includes 96GB memory and two Intel Xeon E5-2620 2.0 Hz processors of which each has 6 physical cores.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, etc.) that are needed to replicate the experiment.
Experiment Setup No The paper describes its methodology in detail, including stages for cleaning, selection, identification, and splitting. However, it does not provide specific numerical hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or system-level training configurations typically found in experimental setups for machine learning models.