reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Cross-entropy of Piecewise Linear Probability Density Functions

Authors: Tom S. F. Haines

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental validation is presented, including a rigorous analysis of accuracy and a demonstration of using the presented result as the objective of a neural network. Previously, cross-entropy would need to be approximated via numerical integration, or equivalent, for which calculating gradients is impractical. Machine learning models with high parameter counts are optimised primarily with gradients, so if piecewise linear density representations are to be used then the presented analytic solution is essential. This paper contributes the necessary theory for the practical optimisation of information theoretic objectives when dealing with piecewise linear distributions directly. Removing this limitation expands the design space for future algorithms.
Researcher Affiliation	Academia	Tom S. F. Haines EMAIL Department of Computer Science University of Bath
Pseudocode	No	The paper includes Python code in Appendix B, which is actual implementation code, not pseudocode or a clearly labeled algorithm block as defined by the question.
Open Source Code	Yes	A complete implementation, including code to generate the included figures, is in the supplementary material and also available from https://github.com/thaines/orogram.
Open Datasets	No	There is no data.
Dataset Splits	No	The paper states "There is no data.", therefore no dataset splits are provided.
Hardware Specification	No	This research made use of Hex, the GPU Cloud in the Department of Computer Science at the University of Bath. This mentions a GPU cloud and a name 'Hex', but lacks specific GPU models, processor types, or detailed specifications needed for reproduction.
Software Dependencies	Yes	The below Python code is for Jax and has been developed with version 0.4.25. Validation, including of gradients, has been performed and may be found in the supplementary material alongside code for the demonstrations within the main text.
Experiment Setup	Yes	Nesterov’s accelerated gradient descent (Nesterov, 1983) is used, with 2048 iterations reducing the KL-divergence from 0.740 to 0.007. The network has two hidden layers of width 32, with Gaussian activations on all layers except the last, which remains linear. It is used as an offset (residual) for point positions, such that the final layer can be initialised with small values so it starts close to an identity transform. ADAM (Kingma & Ba, 2015) with 8192 iterations reduces the KL-divergence from 1.207 to 0.009. Stochastic gradient descent is used, i.e. each iteration a new sample of 256 points is drawn and pushed through the network for calculating the gradient.