Effective String Processing and Matching for Author Disambiguation
Authors: Wei-Sheng Chin, Yong Zhuang, Yu-Chin Juan, Felix Wu, Hsiao-Yu Tung, Tong Yu, Jui-Pin Wang, Cheng-Xia Chang, Chun-Pai Yang, Wei-Cheng Chang, Kuan-Hao Huang, Tzu-Ming Kuo, Shan-Wei Lin, Young-San Lin, Yu-Chen Lu, Yu-Chuan Su, Cheng-Kuang Wei, Tu-Chun Yin, Chun-Liang Li, Ting-Wei Lin, Cheng-Hao Tsai, Shou-De Lin, Hsuan-Tien Lin, Chih-Jen Lin
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board. ... The evaluation measure is the average of F1-scores over all authors in Author.csv. ... In our experiments, the used platform includes 96GB memory and two Intel Xeon E5-2620 2.0 Hz processors of which each has 6 physical cores. |
| Researcher Affiliation | Academia | Wei-Sheng Chin EMAIL ... Chih-Jen Lin EMAIL Department of Computer Science and Information Engineering National Taiwan University Taipei 106, Taiwan |
| Pseudocode | Yes | Algorithm 1: The main procedure in the identification stage of the first implementation. ... Algorithm 2: Ensemble of two results. |
| Open Source Code | Yes | Besides, our implementations are available at https://github.com/kdd-cup-2013-ntu/track2. |
| Open Datasets | Yes | The data set is offered by Microsoft Academic Search (MAS). As a search engine, MAS integrates information of authors and their publications from different sources. ... The organizers of KDD Cup 2013 release seven files, Author.csv, Paper Author.csv, Conference.csv, Journal.csv, Paper.csv, Train.csv, and Valid.csv. ... Other details of data sets and the competition can be found in Roy et al. (2013). |
| Dataset Splits | Yes | In the competition, 20% of authors in Author.csv are used to evaluate duplicates submitted by participants. We refer to them as results on the public leader board. For the final evaluation, the remaining 80% authors in Author.csv are used and the F1-scores are called results on the private leader board. |
| Hardware Specification | Yes | In our experiments, the used platform includes 96GB memory and two Intel Xeon E5-2620 2.0 Hz processors of which each has 6 physical cores. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, etc.) that are needed to replicate the experiment. |
| Experiment Setup | No | The paper describes its methodology in detail, including stages for cleaning, selection, identification, and splitting. However, it does not provide specific numerical hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or system-level training configurations typically found in experimental setups for machine learning models. |