reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Statistical Performance of Collaborative Inference

Authors: Gérard Biau, Kevin Bleakley, Benoît Cadre

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5 we present the remarkable Ramanujan expander graphs and analyze the tradeoﬀ between statistical eﬃciency and communication complexity for these graphs with a series of simulation studies. Lastly, Section 6 provides several elements for analysis of more complicated asynchronous models with delays. Figure 4 shows results for 3- and 5-regular Ramanujan-type matrices (A3 and A5) as well as the previous results for non-Ramanujan-type matrices A0, A1, and A2 (see Figure 2). Figure 5: Statistical eﬃciency vs communication complexity tradeoﬀ for four diﬀerent node communication penalties β. d is the d which minimizes S (Ad) + βC (Ad). Figure 6: Optimizing the number of nodes N and the level of communication d required between nodes to obtain a performance ratio τt(Ad) ≥ 0.99 given a large ﬁxed quantity of data T.
Researcher Affiliation	Academia	Gerard Biau EMAIL Laboratoire de Statistique Théorique et Appliquée, FRE CNRS 3684 Université Pierre et Marie Curie Boîte 158, 4 place Jussieu 75005, Paris, France Kevin Bleakley EMAIL INRIA Saclay Ile-de-France 1 rue Honoré d Estienne d Orves 91120, Palaiseau, France Benoît Cadre EMAIL IRMAR, ENS Rennes Campus de Ker Lann Avenue Robert Schuman 35170 Bruz, France
Pseudocode	No	The paper describes methods using mathematical equations and prose. No explicitly labeled pseudocode blocks or algorithms (e.g., 'Algorithm 1') are present.
Open Source Code	No	The paper does not contain any explicit statements about making source code available, nor does it provide links to repositories or mention code in supplementary materials.
Open Datasets	No	In the trials shown, i.i.d. uniform random variables on [0, 1] are delivered online to N = 5 nodes, one to each at each time t. The paper uses simulated data (i.i.d. uniform random variables) rather than a named, publicly available dataset. No specific dataset source, link, or citation is provided.
Dataset Splits	No	The paper uses simulated data (i.i.d. uniform random variables). As such, it does not describe or require training/test/validation dataset splits typically found with pre-existing datasets.
Hardware Specification	No	The paper conducts "simulation studies" but does not specify any hardware details (e.g., CPU, GPU models, or memory) used for these simulations.
Software Dependencies	No	The paper describes theoretical models and simulation studies but does not specify any software dependencies (e.g., programming languages, libraries, or tools with version numbers) used for its implementation or analysis.
Experiment Setup	Yes	In the trials shown, i.i.d. uniform random variables on [0, 1] are delivered online to N = 5 nodes, one to each at each time t. [...] Figure 6: Optimizing the number of nodes N and the level of communication d required between nodes to obtain a performance ratio τt(Ad) ≥ 0.99 given a large ﬁxed quantity of data T. The minimum is found at (N, d ) = (710, 3), suggesting that with 100 million data points, one can get excellent performance results (τt(Ad ) ≥ 0.99) for a low cost with around 700 nodes, each connected only to three other nodes! Increasing N further raises the cost necessary to obtain the same performance, both due to the price of adding more nodes, as well as requiring more connections between them: d must increase to 4, 5, and so on.