TLDR: Twin Learning for Dimensionality Reduction
Authors: Yannis Kalantidis, Carlos Eduardo Rosar Kos Lassance, Jon Almazán, Diane Larlus
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a large set of ablations and experimental results on common benchmarks for image retrieval, as well as on the natural language processing task of argument retrieval. We show that one can achieve significant gains without altering the encoding and search complexity: for example we can improve landmark image retrieval with Ge M-AP (Revaud et al., 2019) on ROxford5K (Radenović et al., 2018a) by almost 4 m AP points for 128 dimensions, a commonly used dimensionality, by simply replacing PCA with a linear TLDR encoder. Similarly, we are able to improve the state-of-the-art retrieval performance of DINO (Caron et al., 2021) representations on Image Net (Russakovsky et al., 2015), even when compressing the vectors tenfold. |
| Researcher Affiliation | Industry | Yannis Kalantidis EMAIL NAVER LABS Europe Carlos Lassance EMAIL NAVER LABS Europe Jon Almazán EMAIL NAVER LABS Europe Diane Larlus EMAIL NAVER LABS Europe |
| Pseudocode | Yes | An overview is provided in Figure 1 and in Algorithm 1. ... In Algorithm 2 we show the pseudocode of TLDR, which includes initialization, training, and projection. |
| Open Source Code | Yes | Code available at: https://github.com/naver/tldr |
| Open Datasets | Yes | A summary of all the tasks, datasets and representations that we consider in this section is presented in Table 1. We explore tasks like landmark image retrieval on datasets like ROxford, RParis (Radenović et al., 2018a), or k-NN retrieval on Image Net (Russakovsky et al., 2015). ... To learn the dimensionality reduction function, we use GLD-v2, a dataset composed of 1.5 million landmark images (Weyand et al., 2020). |
| Dataset Splits | Yes | Specifically, for landmark image retrieval on ROxford/RParis we use the common protocols presented by Radenović et al. (2018a), where specific queries are defined. ... For Image Net Russakovsky et al. (2015), we follow the exact process used by Caron et al. (2021) and others: The gallery is composed of the full validation set, spanning 1000 classes, and each image from this validation (val) set is used in turn as a query. |
| Hardware Specification | Yes | When it comes to training time, TLDR is noticeably slower than PCA. For example, for d = 128 and for the GLD-v2 dataset, learning PCA takes approximately 18 minutes (multi-core), while learning a linear TLDR over 100 epochs takes approximately 63 minutes (on a single GPU). The latter includes 13 minutes for computing (exact) k-NNs for the dataset. ... We used 16 CPUs and 300GB memory for all manifold learning methods, and one 32GB V100 GPU for TLDR. |
| Software Dependencies | No | The paper mentions 'scikit-learn' for PCA implementation, 'PyTorch-style pseudocode', and the 'pymde' library for visualizations, but it does not specify version numbers for any of these software components. |
| Experiment Setup | Yes | It is noteworthy that we used the exact same hyper-parameters for the learning rate, weight decay, scaling, and λ suggested in Zbontar et al. (2021), despite having very different tasks and encoder architectures. ... For all flavours of TLDR, we fix the number of nearest neighbors to k = 3, although, and as we show in Figure C, TLDR performs well for a wide range of number of neighbors. |