reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Monotone Missing Data: A Blessing and a Curse

Authors: Santtu Tikka, Juha Karvanen

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we consider missing data models represented by directed acyclic graphs (DAGs) and scenarios where the assumption of a monotonic relationship between response indicators enables us to identify distributions of interest that would otherwise be nonidentifiable, and the converse, where the same assumption renders otherwise identifiable distributions nonidentifiable. To the best of our knowledge, there are no previous graphical criteria or algorithms for determining identifiability or nonidentifiability of the missing data distribution under monotone missing data in nonparametric MNAR settings for missing data DAGs. The rest of the paper is organized as follows. Section 2 introduces the notation and the relevant definitions. Section 3 considers missing data models and monotone missingness. Section 4 discusses identifiability in missing data models and the applicability of previous identifiability results for nonmonotone missingness under monotone missingness. Sections 5 and 6 present new results for identifiability and nonidentifiability under monotone missingness, respectively. Section 7 concludes the paper with a discussion.
Researcher Affiliation	Academia	Santtu Tikka EMAIL Department of Mathematics and Statistics University of Jyvaskyla, Finland Juha Karvanen EMAIL Department of Mathematics and Statistics University of Jyvaskyla, Finland
Pseudocode	No	The paper presents theoretical concepts, definitions, theorems, and proofs related to identifiability in missing data models, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements or links indicating the availability of open-source code for the methodology described.
Open Datasets	No	The paper discusses theoretical concepts using example scenarios like 'an intervention program to improve the physical condition of the participants' or 'an epidemiological study on hobbies and cognitive abilities'. These are illustrative and do not refer to specific, publicly available datasets used for empirical analysis.
Dataset Splits	No	The paper is theoretical and does not present experiments based on specific datasets, therefore, there is no mention of dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup or computational results that would require specific hardware. Therefore, no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and focuses on mathematical identifiability results in missing data models. It does not mention any specific software, libraries, or versions used for implementation or analysis.
Experiment Setup	No	The paper is theoretical and presents mathematical results and proofs regarding identifiability. It does not include any experimental setup details, hyperparameters, or training configurations.