Design and Analysis of the NIPS 2016 Review Process

Authors: Nihar B. Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon, Ulrike von Luxburg

JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we analyze several aspects of the data collected during the review process, including an experiment investigating the efficacy of collecting ordinal rankings from reviewers. We make a number of key observations, provide suggestions that may be useful for subsequent conferences, and discuss open problems towards the goal of improving peer review.
Researcher Affiliation Academia Nihar B. Shah EMAIL Machine Learning Department and Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA; Behzad Tabibian EMAIL Max Planck Institute for Intelligent Systems, and Max Planck Institute for Software Systems Tübingen, Germany; Krikamol Muandet EMAIL Max Planck Institute for Intelligent Systems Tübingen, Germany; Isabelle Guyon EMAIL Universite Paris-Saclay, France, and Cha Learn, California; Ulrike von Luxburg EMAIL University of Tübingen, and Max Planck Institute for Intelligent Systems, Tübingen, Germany
Pseudocode No The paper describes procedures and methods in paragraph text and numbered steps (e.g., for the messy middle model), but does not contain any clearly labeled pseudocode or algorithm blocks with structured formatting.
Open Source Code No The paper mentions that authors Behzad Tabibian and Krikamol Muandet "were also the workflow team of NIPS 2016 and were responsible for all the programs, scripts and CMT-related issues during the review process." However, there is no explicit statement about making their own code or scripts open-source for this paper's analysis or method. No links or repositories are provided.
Open Datasets No The paper analyzes "the data collected during the review process" of NIPS 2016 and NIPS 2015. This data appears to be proprietary to the NIPS conference organizers and there is no indication that it is publicly available. No links, DOIs, or citations to public datasets are provided for the data analyzed in the paper.
Dataset Splits Yes Wherever applicable, we also perform our analyses on a subset of the submitted papers which we term as the top 2k papers. The top 2k papers comprise all of the 568 accepted papers, and an equal number (568) of the rejected papers. The 568 rejected papers are chosen as those with the maximum mean score (where the mean for any paper is taken across all reviewers and all reviewers).
Hardware Specification No The paper analyzes data from the NIPS 2016 review process and statistical methods. It does not describe any computational experiments that would require specific hardware, nor does it mention any hardware specifications for the analysis performed.
Software Dependencies No The paper mentions the "Toronto paper matching system or TPMS" and "CMT-related issues" as part of the NIPS 2016 review process, but it does not specify any software or library dependencies with version numbers used for the statistical analysis presented in the paper.
Experiment Setup Yes All t-tests conducted correspond to two-sample t-tests with unequal variances. All mentions of p-values correspond to two-sided tail probabilities. All mentions of statistical significance correspond to a p-value threshold of 0.01 (we also provide the exact p-values alongside). Multiple testing is accounted for using the Bonferroni correction. The effect sizes refer to Cohen s d. Wherever applicable, the error bars in the figures represent 95% confidence intervals.