reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Statistical Inference for Noisy Incomplete Binary Matrix

Authors: Yunxiao Chen, Chengcheng Li, Jing Ouyang, Gongjun Xu

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A simulation study is given in Section 4, and two real-data applications are presented in Section 5. ... We study the ﬁnite-sample performance of the likelihood-based estimator. ... For each setting, 2000 independent data sets are generated from the considered model.
Researcher Affiliation	Academia	Yunxiao Chen EMAIL Department of Statistics London School of Economics and Political Science London WC2A 2AE, UK Chengcheng Li EMAIL Jing Ouyang EMAIL Gongjun Xu EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109, USA
Pseudocode	Yes	Algorithm 1: Projected Gradient Descent Algorithm Input: Partially observed data matrix Y , learning rates γ1 and γ2, tolerance ϵ, and initial values θ(1) = (θ(1) 1 , ..., θ(1) N )T and β(1) = (β(1) 1 , ..., β(1) J )T . Initialize l(0) = and l(1) = l(θ(1), β(1)), and iteration number t = 1; while (\|l(t) l(t 1)\| > ϵ) do
Open Source Code	Yes	The R code for our numerical experiments can be found in https://github.com/Austinlccvic/A-Note-on Statistical-Inference-for-Noisy-Incomplete-1-Bit-Matrix.
Open Datasets	Yes	The data set is a benchmark data set for studying linking methods for educational testing (Gonz alez and Wiberg, 2017). It contains binary responses from two forms of a college admission test.
Dataset Splits	No	The paper describes the composition of the datasets (e.g., N=4000, J=200, 40% missing for educational testing; N=139, J=1648, 26.1% missing for Senate voting), and how simulated data is generated, but it does not specify explicit training/validation/test splits for the datasets to reproduce experimental results.
Hardware Specification	No	The paper mentions "average CPU time per iteration" in the simulation study (Section 4), but does not provide any specific details about the CPU model, GPU, memory, or other hardware used for running the experiments.
Software Dependencies	No	The paper states, "The R code for our numerical experiments can be found in https://github.com/Austinlccvic/A-Note-on Statistical-Inference-for-Noisy-Incomplete-1-Bit-Matrix." While it specifies the programming language (R) and provides a link to the code, it does not mention specific version numbers for R or any R packages used, which are necessary for reproducibility.
Experiment Setup	Yes	Algorithm 1: Projected Gradient Descent Algorithm Input: Partially observed data matrix Y , learning rates γ1 and γ2, tolerance ϵ, and initial values θ(1) = (θ(1) 1 , ..., θ(1) N )T and β(1) = (β(1) 1 , ..., β(1) J )T . ... The convergence criteria is set to be the consecutive change in the joint log-likelihood is smaller than 0.001.