Keep your distance: learning dispersed embeddings on $\mathbb{S}_{m}$

Authors: Evgeniia Tokarchuk, Hua Chang Bakker, Vlad Niculae

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate ( 4) old and new methods on synthetic small and large scale problems, as well as real-world large-scale applications in computer vision and natural language processing, revealing different trade-offs and throughout confirming the importance of representation dispersion for task performance. ... We demonstrate the application of dispersion objectives and provide a comparative analysis on both synthetic and real-world tasks.
Researcher Affiliation Academia Evgeniia Tokarchuk EMAIL Language Technology Lab University of Amsterdam Hua Chang Bakker EMAIL University of Amsterdam Vlad Niculae EMAIL Language Technology Lab University of Amsterdam
Pseudocode No The paper describes algorithms like Lloyd's algorithm and Sliced Dispersion through mathematical formulations and textual descriptions (e.g., in Sections 3.2 and 3.3), but it does not include any clearly labeled pseudocode blocks or algorithm figures.
Open Source Code Yes A reusable library for spherical dispersion is available as open-source software: https://github.com/ltl-uva/ledoh-torch
Open Datasets Yes Mettes et al. (2019) showed that learning prototypes with dispersion encouraged by minimizing the maximum cosine similarity on a hypersphere improves classification results on Image Net-200 (Le & Yang, 2015). ... We report results on two WMT translation tasks:4 WMT 2016 Romanian English (ro-en) with 612K training samples and WMT 2019 English German (en-de) with 9.1M training samples (including back-translated data).
Dataset Splits Yes We report results on two WMT translation tasks:4 WMT 2016 Romanian English (ro-en) with 612K training samples and WMT 2019 English German (en-de) with 9.1M training samples (including back-translated data). We measure translation accuracy on the best checkpoint according to validation BLEU score using Sacre BLEU (Papineni et al., 2002; Post, 2018) and COMET (Rei et al., 2020).
Hardware Specification Yes The authors also thank SURF (www.surf.nl) for the support in using the National Supercomputer Snellius.
Software Dependencies Yes We used fairseq (Ott et al., 2019) framework for training our models. Baseline discrete models (Euclidean baseline) are trained with cross-entropy loss, label smoothing equal to 0.1 and effective batch size 65.5K tokens. All models are trained with learning rate 5 10 4 and 10k warm-up steps for 50k steps in total. ... We used Sacre BLEU (Post, 2018) with the following signature nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.3.1 and COMET (Rei et al., 2020) with unbabel-comet library version 2.2.25 and Unbabel-wmt22-comet-da model.
Experiment Setup Yes Baseline discrete models (Euclidean baseline) are trained with cross-entropy loss, label smoothing equal to 0.1 and effective batch size 65.5K tokens. All models are trained with learning rate 5 10 4 and 10k warm-up steps for 50k steps in total. Spherical baseline and models with dispersion regularizer are trained by defining decoder s embeddings layer as a manifold parameter. We tune learning rate for Riemannian Adam (Becigneul & Ganea, 2019) in the range [5 10 5,5 10 4,5 10 3] and report results with the learning rate 5 10 3.