reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CR-MoE: Consistent Routed Mixture-of-Experts for Scaling Contrastive Learning

Authors: Ziyu Jiang, Guoqing Zheng, Yu Cheng, Ahmed Hassan Awadallah, Zhangyang Wang

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our findings validate CR-Mo E as an effective and efficient image representation learner. Code is available at https://github.com/VITA-Group/CRMo E. Extensive experiments verifies the effectiveness of the proposed regularization term. Compared to competitive state-of-the-art CL methods on Vi T, the proposed CR-Mo E achieves an improvement of 2.8 points at the same computational cost. Pre-training Our pre-training experiments are conducted on Image Net-1K (Deng et al., 2009) following common practice (Chen et al., 2020a; He et al., 2020).
Researcher Affiliation	Collaboration	Ziyu Jiang EMAIL Texas A&M University Guoqing Zheng EMAIL Microsoft Research Yu Cheng EMAIL The Chinese University of Hong Kong Ahmed Hassan Awadallah EMAIL Microsoft Research Zhangyang Wang EMAIL University of Texas at Austin
Pseudocode	No	The paper describes the proposed method, CR-Mo E, through text and a pipeline diagram (Figure 2). However, it does not contain a formally structured pseudocode or algorithm block.
Open Source Code	Yes	Code is available at https://github.com/VITA-Group/CRMo E.
Open Datasets	Yes	Our pre-training experiments are conducted on Image Net-1K (Deng et al., 2009) following common practice (Chen et al., 2020a; He et al., 2020). For transfer few-shot learning, we consider 4-shot and 10-shot settings for three datasets: CIFAR10 (Krizhevsky et al., 2009), Pet37 (Parkhi et al., 2012) and Food101 (Bossard et al., 2014).
Dataset Splits	Yes	For semi-supervised learning, we consider 1% or 10% available labels (following the sampling in Chen et al. (2020b)) of Image Net. For transfer few-shot learning, we consider 4-shot and 10-shot settings for three datasets: CIFAR10 (Krizhevsky et al., 2009), Pet37 (Parkhi et al., 2012) and Food101 (Bossard et al., 2014).
Hardware Specification	Yes	Models are pre-trained on 32 Nvidia V100 GPUs. For inference of a single image on one A6000 GPU, the time costs are 1.25ms and 1.07ms for VMo E-S/16 and Vi T/S-16, respectively. For training a batch of 1024 images on 8 A6000 GPUs, the time costs are 1.579s and 1.425s for VMo E-S/16 and Vi T/S-16, respectively.
Software Dependencies	No	Our implementation is based on Pytorch (Paszke et al., 2019) and Fast-Mo E (He et al., 2021a) library. The paper mentions PyTorch and Fast-MoE but does not specify their version numbers.
Experiment Setup	Yes	For pre-training framework, we employ Moco v3 (Chen et al., 2021b), and we follow the same settings as Moco v3 on data augmentations and learning specification: 3-layer MLP projection head, temperature τ = 0.2, momentum m = 0.99, random patch projection, cosine decay schedule (Loshchilov & Hutter, 2016), and 40-epoch warmup. For optimization, we employ Adam W (Loshchilov & Hutter, 2017) optimizer and a weight decay of 0.1. ... The best searched lr is 5.0e 4 Batch Size/256. For model ablations, we employ a shorter schedule of 100 epochs with a relatively small batch size of 1024. When comparing with state-of-the-art methods, we scale up and employ 300 epochs with a batch size of 3072. For Mo E network, we by default employ 16 expert candidates (ne = 16) and always activate 2 of them (k = 2). For the employed loss terms, we employ λ = 0.2, α = 0.3, wlb = 0.01 and w G = 0.001, which are searched on 100-epoch training.