Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal

Authors: Deheng Yuan, Tao Guo, Zhongyi Huang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using informationtheoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models. To establish our results, we need to prove both the upper and lower bounds for the minimax rate.
Researcher Affiliation Academia 1Department of Mathematical Sciences, Tsinghua University, Beijing, China. 2School of Cyber Science and Engineering, Southeast University, Nanjing, China.
Pseudocode No The paper describes a 'layered estimation protocol' in Section 4.1, but it is presented in descriptive text rather than structured pseudocode or an algorithm block.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to code repositories.
Open Datasets No The paper discusses various estimation settings such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models as special cases of their theoretical framework, which are ways of sample generation for theoretical analysis, not specific open datasets used in experiments. No concrete access information for any dataset is provided.
Dataset Splits No The paper is theoretical and does not conduct experiments on specific datasets, therefore no dataset split information is provided.
Hardware Specification No The paper is theoretical and does not describe any experimental setup or specific hardware used for computations.
Software Dependencies No The paper is theoretical and does not describe any software implementations or list specific software dependencies with version numbers.
Experiment Setup No The paper focuses on theoretical derivations and proofs of optimal rates for nonparametric estimation, and therefore does not provide experimental setup details or hyperparameters.