Difference-of-submodular Bregman Divergence
Authors: Masanari Kimura, Takahiro Kawashima, Tasuku Soma, Hideitsu Hino
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiment shows that learnable difference-of-submodular Bregman divergences can capture the crucial structure and significantly improves the performance of existing methods in downstream tasks. We consider revealing the behavior of the DBD from numerical experiments. The proposed method significantly improves the performance of existing methods on tasks such as clustering and set retrieval problems. |
| Researcher Affiliation | Collaboration | Masanari Kimura School of Mathematics and Statistics The University of Melbourne Victoria, Australia EMAIL Takahio Kawashima ZOZO Research ZOZO Next, Inc. Chiba, Japan EMAIL Tasuku Soma & Hideitsu Hino Institute of Statistical Mathematics / RIKEN AIP Tokyo, Japan EMAIL |
| Pseudocode | No | The paper describes algorithms such as k-means and mentions the use of Adam as an optimization algorithm, but it does not present structured pseudocode or algorithm blocks for its own methodology. |
| Open Source Code | Yes | REPRODUCIBILITY We provide the code needed to reproduce all experiments in the supplementary material attached. |
| Open Datasets | Yes | MNIST (Deng, 2012): The MNIST dataset is a collection of handwritten digits... Model Net40 (Wu et al., 2015): The Model Net40 dataset consists of 12,311 meshes in 40 categories... |
| Dataset Splits | Yes | MNIST (Deng, 2012): ...a training set of 60,000 examples and a test set of 10,000 examples. Model Net40 (Wu et al., 2015): ...12,311 meshes in 40 categories (such as airplane and chair), of which 9,843 are used for training, while the rest 2,468 are reserved for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Adam (Kingma & Ba, 2015) as the optimization algorithm' and 'fully-connected multi-layer perceptron (MLP)', but does not provide specific version numbers for any software, libraries, or frameworks used. |
| Experiment Setup | Yes | For learning the DBD, we use the triplet loss (Hoffer & Ailon, 2018)... We also use Adam (Kingma & Ba, 2015) as the optimization algorithm... with a batch size of 64 and a learning rate of 0.001 unless otherwise stated. The extreme point is taken as the subgradient h1 Y (Edmonds, 1970)... and the grow, shrink, and bar supergradients are taken as g2 Y. For both f 1 and f 2, the MLPs consist of two hidden layers of 64 units. For the activation functions, the Re LU is used for the hidden and final layers. |