Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Authors: Théophane Vallaeys, Matthew J Muckley, Jakob Verbeek, Matthijs Douze

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on four datasets to evaluate QINCO2 for vector compression and billion-scale nearest neighbor search. We obtain outstanding results in both settings, improving the state-of-the-art reconstruction MSE by 34% for 16-byte vector compression on Big ANN, and search accuracy by 24% with 8-byte encodings on Deep1M.
Researcher Affiliation Industry Th eophane Vallaeys, Matthew Muckley, Jakob Verbeek, Matthijs Douze, FAIR, Meta EMAIL
Pseudocode No The paper describes the architecture and processes using equations and descriptive text, but no explicit 'Pseudocode' or 'Algorithm' block is present.
Open Source Code Yes Code is available at: https://github.com/facebookresearch/Qinco
Open Datasets Yes Following Huijben et al. (2024), we evaluate QINCO2 against previous baselines on the four datasets described in Table 1, spanning across various modalities, dimensions and train set sizes. Table 1: The datasets used in our experiments. Dataset Dim. Train vecs. Data type Deep1B (Babenko & Lempitsky, 2016) 96 358M CNN image emb. Big ANN (J egou et al., 2011) 128 100M SIFT descriptors Facebook Sim Search Net++ (FB-ssnpp) (Simhadri et al., 2021) 256 10M SSCD image emb. Contriever (Huijben et al., 2024) 768 20M Contriever text emb.
Dataset Splits No The paper mentions using a 'full training split' and 'database split' but does not provide specific percentages, sample counts, or explicit instructions for how to create these splits from the raw data for reproduction. It states 'We use the full training split during training, and use the database split to report the compression performance (MSE) on 1M vectors, and nearest-neighbor recall percentages at rank 1 (R@1) among 1M database vectors with 10k query vectors.'
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU models, memory) used for running the experiments. It only mentions 'single CPU' for timing in some sections but not for the main experiments.
Software Dependencies No The paper mentions the 'Faiss library' and 'pytorch implementations' but does not specify their version numbers or other software dependencies with versions.
Experiment Setup Yes Architecture details. Unless specified otherwise, we use the model architectures listed in Table 2. We use QINCO2-L for all the vector compression experiments (Section 4.2), as we want to see the impact of large models against the best results of other methods. The smaller models are used for search experiments, where time efficiency matters as well. We set A = 16, B = 32 during training, and A = 32, B = 64 during evaluation for all models. When candidate pre-selection is used without beam search (B = 1), we use A = 32 during training. We fix the codebook size to K = 256, which results in a single byte encoding per step. Training. Compared to QINCO, we improve the initialization of the network and codebooks weights, the dataset normalization, the optimizer and the learning rate scheduler, and increase the batch size. We also stabilize the training by adding gradient clipping, and reduce the number of dead codewords by resetting unused ones similar to Zheng & Vedaldi (2023). Additionally, we notice that large volumes of training data are usually available for unsupervised task such as compression. Huijben et al. (2024) showed that more training data is beneficial to the accuracy of QINCO. Motivated by this observation, we train our models on the full training set of each benchmark (up to 100s of millions of vectors, see Table 1). Details of the training procedure can be found in App. A.2. From Appendix A.2: 'We reduce the number of epochs to 70... We normalize each dataset with a mean of 0... We initialize the QINCO2 codebooks using noisy RQ codebooks... We use the Adam W optimizer (Loshchilov & Hutter, 2019) with default settings, except for a weight decay of 0.1... We use a gradient clipping set to 0.1... We use a maximum learning rate of 0.0008... We use a cosine scheduler... We increase the batch size to 1,024 on each of the 8 GPUs, for an effective batch size of 8,192.'