Keep your distance: learning dispersed embeddings on $\mathbb{S}_{m}$
Authors: Evgeniia Tokarchuk, Hua Chang Bakker, Vlad Niculae
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate ( 4) old and new methods on synthetic small and large scale problems, as well as real-world large-scale applications in computer vision and natural language processing, revealing different trade-offs and throughout confirming the importance of representation dispersion for task performance. ... We demonstrate the application of dispersion objectives and provide a comparative analysis on both synthetic and real-world tasks. |
| Researcher Affiliation | Academia | Evgeniia Tokarchuk EMAIL Language Technology Lab University of Amsterdam Hua Chang Bakker EMAIL University of Amsterdam Vlad Niculae EMAIL Language Technology Lab University of Amsterdam |
| Pseudocode | No | The paper describes algorithms like Lloyd's algorithm and Sliced Dispersion through mathematical formulations and textual descriptions (e.g., in Sections 3.2 and 3.3), but it does not include any clearly labeled pseudocode blocks or algorithm figures. |
| Open Source Code | Yes | A reusable library for spherical dispersion is available as open-source software: https://github.com/ltl-uva/ledoh-torch |
| Open Datasets | Yes | Mettes et al. (2019) showed that learning prototypes with dispersion encouraged by minimizing the maximum cosine similarity on a hypersphere improves classification results on Image Net-200 (Le & Yang, 2015). ... We report results on two WMT translation tasks:4 WMT 2016 Romanian English (ro-en) with 612K training samples and WMT 2019 English German (en-de) with 9.1M training samples (including back-translated data). |
| Dataset Splits | Yes | We report results on two WMT translation tasks:4 WMT 2016 Romanian English (ro-en) with 612K training samples and WMT 2019 English German (en-de) with 9.1M training samples (including back-translated data). We measure translation accuracy on the best checkpoint according to validation BLEU score using Sacre BLEU (Papineni et al., 2002; Post, 2018) and COMET (Rei et al., 2020). |
| Hardware Specification | Yes | The authors also thank SURF (www.surf.nl) for the support in using the National Supercomputer Snellius. |
| Software Dependencies | Yes | We used fairseq (Ott et al., 2019) framework for training our models. Baseline discrete models (Euclidean baseline) are trained with cross-entropy loss, label smoothing equal to 0.1 and effective batch size 65.5K tokens. All models are trained with learning rate 5 10 4 and 10k warm-up steps for 50k steps in total. ... We used Sacre BLEU (Post, 2018) with the following signature nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.3.1 and COMET (Rei et al., 2020) with unbabel-comet library version 2.2.25 and Unbabel-wmt22-comet-da model. |
| Experiment Setup | Yes | Baseline discrete models (Euclidean baseline) are trained with cross-entropy loss, label smoothing equal to 0.1 and effective batch size 65.5K tokens. All models are trained with learning rate 5 10 4 and 10k warm-up steps for 50k steps in total. Spherical baseline and models with dispersion regularizer are trained by defining decoder s embeddings layer as a manifold parameter. We tune learning rate for Riemannian Adam (Becigneul & Ganea, 2019) in the range [5 10 5,5 10 4,5 10 3] and report results with the learning rate 5 10 3. |