reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal

Authors: Deheng Yuan, Tao Guo, Zhongyi Huang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using informationtheoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models. To establish our results, we need to prove both the upper and lower bounds for the minimax rate.
Researcher Affiliation	Academia	1Department of Mathematical Sciences, Tsinghua University, Beijing, China. 2School of Cyber Science and Engineering, Southeast University, Nanjing, China.
Pseudocode	No	The paper describes a 'layered estimation protocol' in Section 4.1, but it is presented in descriptive text rather than structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to code repositories.
Open Datasets	No	The paper discusses various estimation settings such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models as special cases of their theoretical framework, which are ways of sample generation for theoretical analysis, not specific open datasets used in experiments. No concrete access information for any dataset is provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments on specific datasets, therefore no dataset split information is provided.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup or specific hardware used for computations.
Software Dependencies	No	The paper is theoretical and does not describe any software implementations or list specific software dependencies with version numbers.
Experiment Setup	No	The paper focuses on theoretical derivations and proofs of optimal rates for nonparametric estimation, and therefore does not provide experimental setup details or hyperparameters.