reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Near-optimal Active Regression of Single-Index Models

Authors: Yi Li, Wai Ming Tai

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work presents the first algorithm that provides a (1+ε)-approximation solution by querying O(d p 2 1/εp 2) entries of b. This query complexity is also shown to be optimal up to logarithmic factors for p [1, 2] and the ε-dependence of 1/εp is shown to be optimal for p > 2. We first present the main result of this paper that querying O(d/ε2 poly log n) entries of b is sufficient for a (1 + ε)-approximation when 1 p 2 and O(dp/2/εp poly log n) entries when p > 2. Our result achieves the same query complexity (up to logarithmic factors) for the constant-factor approximation algorithm in (Gajjar et al., 2024) when p = 2. Theorem 1. There is a randomized algorithm, when given A Rn d, b Rn, f Lip L and an arbitrary sufficient small ε > 0, with probability at least 0.9, makes O d1 p 2 /ε2 p poly log(d/ε) queries to the entries of b and returns an ˆx Rd such that f(Aˆx) b p p OPT + ε(OPT + Lp Ax p p). (4) The hidden constant in the bound on number of queries depends on p only. Theorem 2. Suppose that p 1, ε > 0 is sufficiently small and n (d log d)/ε2. Any randomized algorithm that, given A Rn d, a query access to the entries of an unknown b Rn and f Lip1, outputs a d-dimensional vector ˆx such that (4) holds with probability at least 4/5 must make Ω(d/ε2) queries to the entries of b. Theorem 3. Suppose that p > 2, ε > 0 is sufficiently small and n d/εp. Any randomized algorithm that, given A Rn d, a query access to the entries of an unknown b Rn and f Lip1, outputs a d-dimensional vector ˆx such that (4) holds with probability at least 4/5 must make Ω(d/εp) queries to the entries of b.
Researcher Affiliation	Academia	Yi Li School of Physical and Mathematical Sciences and College of Computing and Data Science Nanyang Technological University EMAIL Wai Ming Tai Independent Researcher EMAIL
Pseudocode	Yes	Algorithm 1 Generating a Sampling Matrix GSM(k1, . . . , kn, α) Input: n integers k1, . . . , kn 0; a sampling rate α < 1 Algorithm 2 Algorithm for Active Learning without Dependence on n Input: a matrix A Rn d a query access to the entries of the vector b Rn a function f Lip L an error parameter ε two sampling rates α < α < 1 Algorithm 3 Algorithm for Active Learning Input: a matrix A Rn d a query access to the entries of the vector b Rn a function f Lip L an error parameter ε a sampling rate α < 1
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to any code repositories.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and query complexity, rather than empirical evaluation on specific datasets. Therefore, it does not mention the use or availability of any open datasets.
Dataset Splits	No	The paper is theoretical and does not conduct experiments on datasets. Therefore, it does not provide information about dataset splits.
Hardware Specification	No	The paper is theoretical, presenting algorithms and proofs for active regression. It does not describe any experimental implementation or the hardware used to run experiments.
Software Dependencies	No	The paper describes theoretical algorithms and their complexity. It does not mention any specific software packages, libraries, or their version numbers that would be used for implementation.
Experiment Setup	No	The paper is theoretical, presenting algorithms and their theoretical guarantees. It does not include any experimental setup details such as hyperparameters, training configurations, or system-level settings.