reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision

Authors: Byron C. Wallace, Joël Kuiper, Aakash Sharma, Mingxi (Brian) Zhu, Iain J. Marshall

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our approach both retrospectively (using previously collected data) and via a prospective evaluation. We demonstrate that SDS consistently improves performance with respect to baselines that exploit only distant or (a small amount of) direct supervision.
Researcher Affiliation	Collaboration	Byron C. Wallace EMAIL College of Computer and Information Science Northeastern Univeristy Boston, MA, USA Joel Kuiper EMAIL Doctor Evidence Santa Monica, CA, USA Aakash Sharma EMAIL Department of Chemistry University of Texas at Austin Austin, TX, USA Mingxi (Brian) Zhu EMAIL Department of Computer Science University of Texas at Austin Austin, TX, USA Iain J. Marshall EMAIL Department of Primary Care & Public Health Sciences, Faculty of Life Sciences & Medicine King s College London London, UK
Pseudocode	No	The paper describes algorithms and methods using mathematical notation and descriptive text, but does not include explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	No	The paper mentions integrating models into the Robot Reviewer tool (https://robot-reviewer.vortext.systems/), but does not explicitly state that the source code for the proposed SDS methodology itself is openly released or provide a direct repository link for it. The link provided for annotation guidelines (http://byron.ischool.utexas.edu/static/sds-guidelines.pdf) is not for code.
Open Datasets	Yes	We next describe the Cochrane Database of Systematic Reviews (CDSR) (The Cochrane Collaboration, 2014), which is the database we used to derive DS. ... The Cochrane Database of Systematic Reviews, 2014. URL http://www.thecochranelibrary.com.
Dataset Splits	Yes	We performed ﬁve-fold validation on the 133 articles for which candidate sentences were directly labeled across all three PICO elements (recall that we group Intervention and Comparator together).
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	Yes	Speciﬁcally, we used the implementation in the Python machine learning library scikit-learn (Pedregosa et al., 2011) v0.17, with default estimation parameters save for class weight which we set to balanced. ... The parameters of the SDS model (i.e., w in Equation 1) were estimated using LIBLINEAR (Fan et al., 2008).
Experiment Setup	Yes	For all models, class weights were set inversely to their prevalences in the training dataset (mistakes on the rare class positive instances were thus more severely penalized). For distant and direct only models, we conducted a line-search over C values from 10 up to 105, taking logarithmically spaced steps. ... For the SDS model (Equation 6) we performed grid search over λ and C values. Speciﬁcally we searched over λ = {2, 10, 50, 100, 200, 500} and the same set of C values speciﬁed above.