Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems

Authors: N. Fernandez Garcia, J. Arias Fisteus, L. Sanchez Fernandez

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition, we provide a comparative empirical evaluation of the different approaches on five different corpora: the TAC 2010 corpus and four corpora built from actual Wikipedia articles and news items. ... The main goals of this paper are twofold: (1) offer an overview of link-based approaches for candidate ranking in link-to-Wikipedia systems; and, (2) perform an empirical evaluation to compare these approaches. ... This section reports the results of the evaluation of the different approaches described in section 3.
Researcher Affiliation Academia Norberto Fern andez Garc ıa EMAIL Jes us Arias Fisteus EMAIL Luis S anchez Fern andez EMAIL Telematics Engineering Department Universidad Carlos III de Madrid Avda. Universidad, 30, E-28911 Legan es, Madrid, Spain.
Pseudocode No The paper describes algorithms and methods using mathematical formulations and textual descriptions, but does not include any clearly labeled pseudocode or algorithm blocks. For example, Section 3 "Overview of Approaches" describes methods like "Bag-of-Links Approaches" and "Graph Approaches" using prose and equations.
Open Source Code No The paper mentions using a third-party tool: "In particular, we took advantage of the open source implementation of List Net provided by the University of Massachusetts Rank Lib package (Van B. Dang, 2014)." However, it does not provide any specific link or statement about releasing the authors' own implementation code for the methodology described in the paper.
Open Datasets Yes To summarize, we carry out our evaluation in five different corpora (Cucerzan news, Cucerzan Wikipedia, Wikipedia random, Wikinews and TAC 2010)... These corpora are available to download at: http://www.it.uc3m.es/berto/link-to-wikipedia/survey/ ... information has been obtained from a dump of Wikipedia page links provided by DBpedia (Bizer et al., 2009) version 3.82, which was generated from a full Wikipedia dump dated in June 2012. The links dump was preprocessed as follows: ... 2. http://downloads.dbpedia.org/3.8/en/pageen.nt.bz2 (April, 2014) 3. http://downloads.dbpedia.org/3.8/en/redirects en.nt.bz2 (April, 2014) 4. http://downloads.dbpedia.org/3.8/en/disambiguations en.nt.bz2 (April, 2014)
Dataset Splits Yes In order to report the accuracy, MRR and DCG@K of the List Net variants, we use the results obtained by averaging 10 repetitions of 10-fold cross validation on the particular corpus being analyzed.
Hardware Specification Yes In particular, the average run-time per query (in seconds) measured on a Linux 2.6.32, Intel Core i7 2.80GHz PC with 16GB RAM was under one second for all the approaches except RWF and PPR, which run closer to 4 and 571 seconds per query respectively.
Software Dependencies Yes In particular, the average run-time per query (in seconds) measured on a Linux 2.6.32... information has been obtained from a dump of Wikipedia page links provided by DBpedia (Bizer et al., 2009) version 3.82... We used the information retrieval library Apache Lucene (Apache Software Foundation, 2014) to build an index... using the Stanford NER tool (Finkel, Grenager, & Manning, 2005)... we took advantage of the open source implementation of List Net provided by the University of Massachusetts Rank Lib package (Van B. Dang, 2014).
Experiment Setup Yes Where d is a damping factor which can be set between 0 and 1, but is typically set to 0.85 according to Brin and Page (1998) and Hachey et al. (2011). ... In this case we will use the values k L = 0.55 and k C = 0.25 as suggested by Fern andez et al. (2010). ... In all the cases, we have used the configuration parameters for List Net that are suggested in the Rank Lib implementation (1500 epochs and a learning rate of 10 5).