reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation

Authors: Tony Tohme, Mohsen Sadr, KAMAL YOUCEF-TOUMI, Nicolas Hadjiconstantinou

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the proposed MESSY estimation method against other benchmark methods for the case of a bi-modal and a discontinuous density, as well as a density at the limit of physical realizability. We find that the addition of a symbolic search for basis functions improves the accuracy of the estimation at a reasonable additional computational cost. Our results suggest that the proposed method outperforms existing density recovery methods in the limit of a small to moderate number of samples by providing a low-bias and tractable symbolic description of the unknown density at a reasonable computational cost.
Researcher Affiliation	Academia	Tony Tohme EMAIL Massachusetts Institute of Technology, USA. Mohsen Sadr EMAIL Massachusetts Institute of Technology, USA. Paul Scherrer Institute, Switzerland. Kamal Youcef-Toumi EMAIL Massachusetts Institute of Technology, USA. Nicolas G. Hadjiconstantinou EMAIL Massachusetts Institute of Technology, USA.
Pseudocode	Yes	Algorithm 1: Modified Gram-Schmidt Algorithm 2: Multi-level, symbolic and recursive algorithm for density recovery. Algorithm 3: Pseudocode of the proposed MESSY estimation method. Algorithm 4: Newton s method for finding Lagrange multipliers of MED given moments µ for a given tolerance ϵ. Algorithm 5: Newton s method for finding Lagrange multipliers of Mx ED given moments µ and samples of prior XPrior FPrior for a given tolerance ϵ.
Open Source Code	No	The paper does not contain any explicit statement about releasing the code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets	No	The paper describes synthetic datasets generated from specific probability distributions (e.g., "a one-dimensional bi-modal distribution function constructed by mixing two Normal distribution functions N(x \| µ, σ)", "samples from a distribution in this limit"). While the generation process is described, it does not provide concrete access information (URL, DOI, repository, or citation to an existing public dataset) for the specific sample sets used in the experiments.
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits. For density estimation, the method typically uses all available samples. While it mentions "K-fold cross-validation with K = 5" for optimizing the bandwidth of the baseline KDE method, this is not a general dataset split for the main proposed method (MESSY).
Hardware Specification	Yes	The execution time is computed on a single core and single thread of 2.3GHz Quad-Core Intel Core i7 processor and averaged over 5 ensembles.
Software Dependencies	No	The paper does not explicitly list any software dependencies with specific version numbers. It refers to standard concepts and algorithms but does not specify the software environment used for implementation (e.g., Python, PyTorch, or specific scientific computing libraries with versions).
Experiment Setup	Yes	Unless mentioned otherwise, we report error, time, and KL Divergence by ensemble averaging over 25 for different sets of samples. Furthermore, in the case of MESSY-S we perform Niters = 10 iterations, and we consider (+, , ) operators and (cos, sin) functions. For Mx ED and MESSY-P, we use Nb = Nm = 4. In the case of MESSY-S, we randomly create Nb basis functions which are O(x4) (where Nb is sampled uniformly within {2, . . . , 8}). Both MESSY results are subject to a cross-entropy correction step with Nb = 4 polynomial moments.