Design and Analysis of the NIPS 2016 Review Process
Authors: Nihar B. Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon, Ulrike von Luxburg
JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we analyze several aspects of the data collected during the review process, including an experiment investigating the efficacy of collecting ordinal rankings from reviewers. We make a number of key observations, provide suggestions that may be useful for subsequent conferences, and discuss open problems towards the goal of improving peer review. |
| Researcher Affiliation | Academia | Nihar B. Shah EMAIL Machine Learning Department and Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA; Behzad Tabibian EMAIL Max Planck Institute for Intelligent Systems, and Max Planck Institute for Software Systems Tübingen, Germany; Krikamol Muandet EMAIL Max Planck Institute for Intelligent Systems Tübingen, Germany; Isabelle Guyon EMAIL Universite Paris-Saclay, France, and Cha Learn, California; Ulrike von Luxburg EMAIL University of Tübingen, and Max Planck Institute for Intelligent Systems, Tübingen, Germany |
| Pseudocode | No | The paper describes procedures and methods in paragraph text and numbered steps (e.g., for the messy middle model), but does not contain any clearly labeled pseudocode or algorithm blocks with structured formatting. |
| Open Source Code | No | The paper mentions that authors Behzad Tabibian and Krikamol Muandet "were also the workflow team of NIPS 2016 and were responsible for all the programs, scripts and CMT-related issues during the review process." However, there is no explicit statement about making their own code or scripts open-source for this paper's analysis or method. No links or repositories are provided. |
| Open Datasets | No | The paper analyzes "the data collected during the review process" of NIPS 2016 and NIPS 2015. This data appears to be proprietary to the NIPS conference organizers and there is no indication that it is publicly available. No links, DOIs, or citations to public datasets are provided for the data analyzed in the paper. |
| Dataset Splits | Yes | Wherever applicable, we also perform our analyses on a subset of the submitted papers which we term as the top 2k papers. The top 2k papers comprise all of the 568 accepted papers, and an equal number (568) of the rejected papers. The 568 rejected papers are chosen as those with the maximum mean score (where the mean for any paper is taken across all reviewers and all reviewers). |
| Hardware Specification | No | The paper analyzes data from the NIPS 2016 review process and statistical methods. It does not describe any computational experiments that would require specific hardware, nor does it mention any hardware specifications for the analysis performed. |
| Software Dependencies | No | The paper mentions the "Toronto paper matching system or TPMS" and "CMT-related issues" as part of the NIPS 2016 review process, but it does not specify any software or library dependencies with version numbers used for the statistical analysis presented in the paper. |
| Experiment Setup | Yes | All t-tests conducted correspond to two-sample t-tests with unequal variances. All mentions of p-values correspond to two-sided tail probabilities. All mentions of statistical significance correspond to a p-value threshold of 0.01 (we also provide the exact p-values alongside). Multiple testing is accounted for using the Bonferroni correction. The effect sizes refer to Cohen s d. Wherever applicable, the error bars in the figures represent 95% confidence intervals. |