Joint Learning of Energy-based Models and their Partition Function
Authors: Michael Eli Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our approach on multilabel classification and label ranking. ... We evaluate our models on classical multilabel classification datasets. ...Our results in Table 1 show that the logistic and sparsemax losses trained with our approach work better than the generalized Fenchel-Young loss as well as min-max and MCMC sampling approaches in various configurations. For the min-max approach, we use optimistic ADAM as solver, an MLP as generator and we use REINFORCE (score function estimator) for gradient estimation. For MCMC sampling, we use standard Metropolis Hastings algorithm with uniform proposal distribution. We also present learning curves in Figure 1. |
| Researcher Affiliation | Industry | 1Google Deep Mind. Correspondence to: Mathieu Blondel <EMAIL>, Micha el E. Sander <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Doubly stochastic objective value computation |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own source code for the methodology described. It mentions using JAX for implementation, but this is a third-party tool. |
| Open Datasets | Yes | Multilabel classification datasets. We use the same datasets as in Blondel et al. (2022). The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. ... Label ranking. The publicly-available datasets can be downloaded from https://github.com/akorba/Structured_Approach_Label_Ranking. |
| Dataset Splits | Yes | The dataset characteristics are described in Table 3 below. Table 3. Dataset Characteristics Dataset Type Train Dev Test Features Classes Avg. labels Birds Audio 134 45 172 260 19 1.96 Cal500 Music 376 126 101 68 174 25.98 Emotions Music 293 98 202 72 6 1.82 Mediamill Video 22,353 7,451 12,373 120 101 4.54 Scene Images 908 303 1,196 294 6 1.06 Yeast Micro-array 1,125 375 917 103 14 4.17 |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU models, CPU specifications) used for running the experiments. |
| Software Dependencies | No | Our implementation is made using JAX (Bradbury et al., 2018). ... We use the Adam optimizer (Kingma, 2014)... The paper mentions JAX and Adam but does not provide specific version numbers for these software dependencies or any other libraries. |
| Experiment Setup | Yes | Convergence curves. Convergence curves are in Figure 3. We use a linear model for g (unary model), and an MLP for τ with Re LU acivation and a hidden dimension of 128. Models are trained with the logistic loss. We use the Adam optimizer (Kingma, 2014) with a learning rate of 10 4 for the parameters of both g and τ. The models are trained for 5000 steps with full batch w.r.t. (xi, yi) pairs. |