TLDR: Twin Learning for Dimensionality Reduction

Authors: Yannis Kalantidis, Carlos Eduardo Rosar Kos Lassance, Jon Almazán, Diane Larlus

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a large set of ablations and experimental results on common benchmarks for image retrieval, as well as on the natural language processing task of argument retrieval. We show that one can achieve significant gains without altering the encoding and search complexity: for example we can improve landmark image retrieval with Ge M-AP (Revaud et al., 2019) on ROxford5K (Radenović et al., 2018a) by almost 4 m AP points for 128 dimensions, a commonly used dimensionality, by simply replacing PCA with a linear TLDR encoder. Similarly, we are able to improve the state-of-the-art retrieval performance of DINO (Caron et al., 2021) representations on Image Net (Russakovsky et al., 2015), even when compressing the vectors tenfold.
Researcher Affiliation Industry Yannis Kalantidis EMAIL NAVER LABS Europe Carlos Lassance EMAIL NAVER LABS Europe Jon Almazán EMAIL NAVER LABS Europe Diane Larlus EMAIL NAVER LABS Europe
Pseudocode Yes An overview is provided in Figure 1 and in Algorithm 1. ... In Algorithm 2 we show the pseudocode of TLDR, which includes initialization, training, and projection.
Open Source Code Yes Code available at: https://github.com/naver/tldr
Open Datasets Yes A summary of all the tasks, datasets and representations that we consider in this section is presented in Table 1. We explore tasks like landmark image retrieval on datasets like ROxford, RParis (Radenović et al., 2018a), or k-NN retrieval on Image Net (Russakovsky et al., 2015). ... To learn the dimensionality reduction function, we use GLD-v2, a dataset composed of 1.5 million landmark images (Weyand et al., 2020).
Dataset Splits Yes Specifically, for landmark image retrieval on ROxford/RParis we use the common protocols presented by Radenović et al. (2018a), where specific queries are defined. ... For Image Net Russakovsky et al. (2015), we follow the exact process used by Caron et al. (2021) and others: The gallery is composed of the full validation set, spanning 1000 classes, and each image from this validation (val) set is used in turn as a query.
Hardware Specification Yes When it comes to training time, TLDR is noticeably slower than PCA. For example, for d = 128 and for the GLD-v2 dataset, learning PCA takes approximately 18 minutes (multi-core), while learning a linear TLDR over 100 epochs takes approximately 63 minutes (on a single GPU). The latter includes 13 minutes for computing (exact) k-NNs for the dataset. ... We used 16 CPUs and 300GB memory for all manifold learning methods, and one 32GB V100 GPU for TLDR.
Software Dependencies No The paper mentions 'scikit-learn' for PCA implementation, 'PyTorch-style pseudocode', and the 'pymde' library for visualizations, but it does not specify version numbers for any of these software components.
Experiment Setup Yes It is noteworthy that we used the exact same hyper-parameters for the learning rate, weight decay, scaling, and λ suggested in Zbontar et al. (2021), despite having very different tasks and encoder architectures. ... For all flavours of TLDR, we fix the number of nearest neighbors to k = 3, although, and as we show in Figure C, TLDR performs well for a wide range of number of neighbors.