reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dataset Ownership Verification in Contrastive Pre-trained Models

Authors: Yuechen Xie, Jie Song, Mengqi Xue, Haofei Zhang, Xingen Wang, Bingde Hu, Genlang Chen, Mingli Song

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method using six visual datasets (CIFAR10 Krizhevsky et al. (2009), CIFAR100 Krizhevsky et al. (2009), SVHN Netzer et al. (2011), Image Nette Howard (2019), Image Woof Howard (2019) and Image Net Deng et al. (2009)) and five contrastive learning algorithms (Sim CLR, BYOL, Sim Siam, MOCO v3, and DINO). The specific experimental setup is introduced in Section 4.1, results and analyses are presented in Section 4.2, the application of our method on the Image Net pre-trained models are demonstrated in Section 4.3, ablation studies are conducted in Section 4.4 and Appendix A.
Researcher Affiliation	Collaboration	1Zhejiang University, 2Hangzhou City University, 3Bangsheng Technology Co., Ltd 4Ningbo Tech University, 5State Key Laboratory of Blockchain and Security, Zhejiang University 6Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
Pseudocode	No	The paper describes the proposed method in Section 3 and provides a visual overview in Figure 2. However, it does not contain a formal pseudocode block or an algorithm section with structured, code-like steps. The steps are described narratively.
Open Source Code	Yes	The results demonstrate that our method rejects the null hypothesis with a p-value markedly below 0.05, surpassing all previous methodologies. Our code is available at https://github.com/xieyc99/DOV4CL.
Open Datasets	Yes	We evaluate our method using six visual datasets (CIFAR10 Krizhevsky et al. (2009), CIFAR100 Krizhevsky et al. (2009), SVHN Netzer et al. (2011), Image Nette Howard (2019), Image Woof Howard (2019) and Image Net Deng et al. (2009))
Dataset Splits	Yes	To simulate Dalt, a dataset similar to Dpub but without overlapping data (as described in Section 3.1), we randomly divide a dataset into two subsets of equal size representing Dpub and Dalt, respectively. For Dpvt, we set it as the testing set of the undivided dataset for convenience. Specific settings are as follows: Experiment 1: Dpub is random half of CIFAR10 training set and Dalt is the other half. Dunre, Dsdw and Dpvt are SVHN, CIFAR100 and CIFAR10 testing set respectively. Experiment 2: Dpub is random half of Image Nette training set and Dalt is the other half. Dunre, Dsdw and Dpvt are Image Woof, SVHN and Image Nette testing set respectively.
Hardware Specification	Yes	All experiments are conducted on four NVIDIA RTX A6000s and one NVIDIA Ge Force RTX 4090.
Software Dependencies	No	The paper mentions several contrastive learning algorithms (Sim CLR, BYOL, Sim Siam, MOCO v3, DINO), model architectures (VGG16, Resnet18, Vi T-T, Vi T-S, Vi T-B), and optimizers (SGD). However, it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup	Yes	On CIFAR10/CIFAR100/SVHN, we pre-train the encoder for 800 epochs with a batch size of 512. On Image Nette/Image Woof, the encoder with non-Vi T-S/16 architecture is pre-trained for 800 epochs, while Vi T-S/16 architecture is pre-trained for 2000 epochs with a batch size of 64. The initial learning rate for all pre-training sessions is set at 0.06 and adjusted using a Cosine Annealing scheduler. The optimizer is SGD, with a momentum of 0.9 and a weight decay of 5 10 4. All experiments are conducted on four NVIDIA RTX A6000s and one NVIDIA Ge Force RTX 4090. In all experiments, we set M = 2 and N = 6. Both T g and T l are composed of random cropping, color jitter, random flipping, and random grayscale, with respective cropping ranges of (0.4, 1.0) and (0.05, 0.4).