reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Contrastive Clustering with Negative Pair-guided Regularization

Authors: Abhishek Kumar, Anish Chakrabarty, Sankha Subhra Mullick, Swagatam Das

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	NRCC s superiority is demonstrated across various datasets with different scales and cluster structures, outperforming 20 state-of-the-art methods. ... Empirical evaluation of the efficacy of NRCC with Grid Shift (Kumar et al., 2022) against the state-of-the-arts in Section 4, shows a 4.7% improvement in clustering accuracy by BYOL+NRCC+UMAP+Grid Shift on an average over eight datasets.
Researcher Affiliation	Collaboration	Abhishek Kumar EMAIL ENET Centre, Centre for Energy and Environmental Technologies VSBTechnical University of Ostrava, Ostrava, Czech Republic. Anish Chakrabarty EMAIL Statistics and Mathematics Unit Indian Statistical Institute, Kolkata, India Sankha Subhra Mullick EMAIL Dolby Laboratories, India Swagatam Das EMAIL Electronics and Communication Sciences Unit Indian Statistical Institute, Kolkata, India
Pseudocode	Yes	Algorithm 1: Augmented view generation with SGHMC. Algorithm 2: The proposed Info NCE+NRCC. Algorithm 3: The proposed BYOL+NRCC.
Open Source Code	Yes	The code base is available at https://github.com/abhisheka456/NRCC.
Open Datasets	Yes	We consider four types of datasets, namely large-scale moderate resolution CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and STL-10 (Coates et al., 2011), moderate scale higher resolution subsets of Image Net such as Image Net-10 and Image Net-Dogs (Russakovsky et al., 2015), large-scale higher resolution Tiny Image Net (Le & Yang, 2015), and large scale moderate resolution long-tailed CIFAR-10-LT (Tang et al., 2020) and CIFAR-20-LT (Tang et al., 2020).
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits, percentages, or sample counts used for the experiments. It mentions using established datasets and refers to external protocols for ImageNet-1k but does not detail its own splitting methodology or specific splits for the reported results.
Hardware Specification	Yes	We use the same computing setup with four V100 GPUs while calculating the time in hours to ensure fairness.
Software Dependencies	No	The paper mentions using the Stochastic Gradient Descent (SGD) optimizer but does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We have rigorously trained all models for 1,000 epochs, following the conventional recommendations (Tao et al., 2021; Tsai et al., 2021). We have utilized the Stochastic Gradient Descent (SGD) optimizer with a cosine learning rate scheduler, that includes a warm up for the initial 50 updates. For Mo Co (He et al., 2020), BYOL (Grill et al., 2020), and NRCC, we set the base learning rate to 0.05, dynamically scaling it with the batch size (β = 0.05 n/256). ... Finally, we set the temperature τ (searched between {0.1, 1, 10} and the regularization weight λ (varied between {0.1, 0.5, 1} to 0.1 each. In the case of SGHMC, we set δ1, δ2, δ3, and ζ as 0.1, 0.05, 0.99, and 1 ... Following conventional guidelines, the mini-batch size was 512 for Mo Co and 256 for the remaining models, including NRCC. ... train a Res Net-50 for 200 epochs.