Joint Learning of Energy-based Models and their Partition Function

Authors: Michael Eli Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our approach on multilabel classification and label ranking. ... We evaluate our models on classical multilabel classification datasets. ...Our results in Table 1 show that the logistic and sparsemax losses trained with our approach work better than the generalized Fenchel-Young loss as well as min-max and MCMC sampling approaches in various configurations. For the min-max approach, we use optimistic ADAM as solver, an MLP as generator and we use REINFORCE (score function estimator) for gradient estimation. For MCMC sampling, we use standard Metropolis Hastings algorithm with uniform proposal distribution. We also present learning curves in Figure 1.
Researcher Affiliation Industry 1Google Deep Mind. Correspondence to: Mathieu Blondel <EMAIL>, Micha el E. Sander <EMAIL>.
Pseudocode Yes Algorithm 1 Doubly stochastic objective value computation
Open Source Code No The paper does not provide an explicit statement about releasing its own source code for the methodology described. It mentions using JAX for implementation, but this is a third-party tool.
Open Datasets Yes Multilabel classification datasets. We use the same datasets as in Blondel et al. (2022). The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. ... Label ranking. The publicly-available datasets can be downloaded from https://github.com/akorba/Structured_Approach_Label_Ranking.
Dataset Splits Yes The dataset characteristics are described in Table 3 below. Table 3. Dataset Characteristics Dataset Type Train Dev Test Features Classes Avg. labels Birds Audio 134 45 172 260 19 1.96 Cal500 Music 376 126 101 68 174 25.98 Emotions Music 293 98 202 72 6 1.82 Mediamill Video 22,353 7,451 12,373 120 101 4.54 Scene Images 908 303 1,196 294 6 1.06 Yeast Micro-array 1,125 375 917 103 14 4.17
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU models, CPU specifications) used for running the experiments.
Software Dependencies No Our implementation is made using JAX (Bradbury et al., 2018). ... We use the Adam optimizer (Kingma, 2014)... The paper mentions JAX and Adam but does not provide specific version numbers for these software dependencies or any other libraries.
Experiment Setup Yes Convergence curves. Convergence curves are in Figure 3. We use a linear model for g (unary model), and an MLP for τ with Re LU acivation and a hidden dimension of 128. Models are trained with the logistic loss. We use the Adam optimizer (Kingma, 2014) with a learning rate of 10 4 for the parameters of both g and τ. The models are trained for 5000 steps with full batch w.r.t. (xi, yi) pairs.