reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Joint Learning of Energy-based Models and their Partition Function

Authors: Michael Eli Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our approach on multilabel classification and label ranking. ... We evaluate our models on classical multilabel classification datasets. ...Our results in Table 1 show that the logistic and sparsemax losses trained with our approach work better than the generalized Fenchel-Young loss as well as min-max and MCMC sampling approaches in various configurations. For the min-max approach, we use optimistic ADAM as solver, an MLP as generator and we use REINFORCE (score function estimator) for gradient estimation. For MCMC sampling, we use standard Metropolis Hastings algorithm with uniform proposal distribution. We also present learning curves in Figure 1.
Researcher Affiliation	Industry	1Google Deep Mind. Correspondence to: Mathieu Blondel <EMAIL>, Micha el E. Sander <EMAIL>.
Pseudocode	Yes	Algorithm 1 Doubly stochastic objective value computation
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code for the methodology described. It mentions using JAX for implementation, but this is a third-party tool.
Open Datasets	Yes	Multilabel classification datasets. We use the same datasets as in Blondel et al. (2022). The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. ... Label ranking. The publicly-available datasets can be downloaded from https://github.com/akorba/Structured_Approach_Label_Ranking.
Dataset Splits	Yes	The dataset characteristics are described in Table 3 below. Table 3. Dataset Characteristics Dataset Type Train Dev Test Features Classes Avg. labels Birds Audio 134 45 172 260 19 1.96 Cal500 Music 376 126 101 68 174 25.98 Emotions Music 293 98 202 72 6 1.82 Mediamill Video 22,353 7,451 12,373 120 101 4.54 Scene Images 908 303 1,196 294 6 1.06 Yeast Micro-array 1,125 375 917 103 14 4.17
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU models, CPU specifications) used for running the experiments.
Software Dependencies	No	Our implementation is made using JAX (Bradbury et al., 2018). ... We use the Adam optimizer (Kingma, 2014)... The paper mentions JAX and Adam but does not provide specific version numbers for these software dependencies or any other libraries.
Experiment Setup	Yes	Convergence curves. Convergence curves are in Figure 3. We use a linear model for g (unary model), and an MLP for τ with Re LU acivation and a hidden dimension of 128. Models are trained with the logistic loss. We use the Adam optimizer (Kingma, 2014) with a learning rate of 10 4 for the parameters of both g and τ. The models are trained for 5000 steps with full batch w.r.t. (xi, yi) pairs.