reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

$f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive Learning

Authors: Yiwei Lu, Guojun Zhang, Sun Sun, Hongyu Guo, Yaoliang Yu

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using benchmark tasks from both vision and natural language, we empirically evaluate f MICL with different f-divergences on various architectures (Sim CLR, Mo Co, and Mo Co v3) and datasets. We observe that f-MICL generally outperforms the benchmarks and the best-performing f-divergence is task and dataset dependent.
Researcher Affiliation	Collaboration	Yiwei Lu EMAIL School of Computer Science University of Waterloo Vector Institute Guojun Zhang EMAIL Huawei Noah s Ark Lab Sun Sun EMAIL School of Computer Science University of Waterloo National Research Council Canada Hongyu Guo EMAIL National Research Council Canada University of Ottawa Yaoliang Yu EMAIL School of Computer Science University of Waterloo Vector Institute
Pseudocode	Yes	Algorithm 1: f-MICL Input: batch size N, function f, weighting parameter α, constant µ (in Gσ), variance σ2
Open Source Code	No	In this paper, we follow the implementations in Sim CLR (https://github.com/sthalles/Sim CLR) and Mo Co v3 (https://github.com/facebookresearch/moco-v3). For fair comparison we use the experimental settings in Table 7 for all the baseline methods, which might differ from the original settings. Table 7 gives common choices of hyperparameters for different datasets. Note that we may need to further finetune α and σ for different f-divergences. See our supplementary code for more details.
Open Datasets	Yes	Our vision datasets include CIFAR-10 (Krizhevsky et al., 2009), STL-10 (Coates et al., 2011), Tiny Image Net (Chrabaszcz et al., 2017), and Image Net (Deng et al., 2009) for image classification. To show the wide applicability of our f-MICL framework, we also conduct experiments on a natural language dataset, English Wikipedia (Gao et al., 2021).
Dataset Splits	Yes	Evaluation metric: for vision tasks, we use k-nearest-neighbor (k-NN) (only small datasets) and linear evaluation to evaluate the performance, based on the learned embeddings. For each sample in a dataset we create a sample pair, a.k.a. positive pair, using two different augmentation functions. For image samples, we choose the augmentation functions to be the standard ones in contrastive learning, e.g., in Chen et al. (2020) and He et al. (2020).
Hardware Specification	Yes	Hardware and package: We train on a GPU cluster with NVIDIA T4 and P100.
Software Dependencies	No	Hardware and package: We train on a GPU cluster with NVIDIA T4 and P100. The platform we use is pytorch. Specifically, the pairwise summation can be easily implemented using torch.nn.functional.pdist from pytorch.
Experiment Setup	Yes	Batch size and embedding dimension: for experiments in CIFAR-10 we choose batch size 512; for STL-10 we choose batch size 64 to accommodate one GPU training; for Tiny Image Net, we choose batch size 256; for Image Net, we choose batch size 1024. For all the vision datasets, we choose the embedding dimension to be 512. Regarding the language dataset, the batch size is 64 with the feature dimension 768. Hyperparameters: in all our experiments we fix the constant factor µ = 1. We find that in practice the weight parameter α often needs to be large (e.g., in the Wikipedia dataset), which requires moderate tuning. Optimizer and learning rate scheduler: For smaller vision tasks, we use SGD with momentum for optimization and the cosine learning rate scheduler (Loshchilov & Hutter, 2017). For the Image Net task and natural language task, we use Adam with weight decay (Loshchilov & Hutter, 2018) and the linear decay scheduler. Table 7 gives common choices of hyperparameters for different datasets.