reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning

Authors: Jingyang Li, Jiachun Pan, Vincent Tan, Kim-chuan Toh, Pan Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results corroborate our theoretical findings and the enhanced generalization capability of SA-Fix Match. To corroborate our theoretical results, we evaluate SL, Fix Match, and SA-Fix Match on CIFAR-100 (Krizhevsky et al., 2009), STL-10 (Coates et al., 2011), Imagewoof (Howard & Gugger, 2020), and Image Net (Deng et al., 2009).
Researcher Affiliation	Academia	Jingyang Li1 Jiachun Pan1 Vincent Y. F. Tan1 Kim-Chuan Toh1 Pan Zhou2 1National University of Singapore 2 Singapore Management University EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	J (SA-)FIXMATCH ALGORITHM In this section, we present the detailed algorithm framework for Fix Match (Sohn et al., 2020) and SA-Fix Match. At iteration t, we first sample a batch of B labeled data X (t) from labeled dataset Zl, and a batch of µB unlabeled data U(t) from unlabeled dataset Zu. Then, according to Algorithm 1, we calculate the loss for current iteration, and use it for the update of the neural network model F (t). The only difference between Fix Match and SA-Fix Match is in line 6, where Fix Match adopts Cut Out in its strong augmentation of unlabeled data A, while SA-Fix Match adopts SA-Cut Out. Algorithm 1 (SA-)Fix Match algorithm. 1: Input: Labeled batch X (t) = {(Xi, yi) : i (1, . . . , B)}, unlabeled batch U(t) = {Ui : i (1, . . . , µB)}, confidence threshold τ, unlabeled data ratio µ, unlabeled loss weight λ. 2: L(t) s = 1 B PB i=1 log logityi(F (t), α(Xi)) {Cross-entropy loss for labeled data} 3: for i = 1 to µB do 4: vi = arg maxj{logitj(F (t), α(Ui))} {Compute prediction after applying weak data augmentation of Ui} 5: end for 6: L(t) u = 1 µB PµB i=1 I{logitvi(F (t),α(Ui)) τ} log logitvi(F (t), A(Ui)) {Cross-entropy loss with pseudo-label and confidence for unlabeled data} 7: return: L(t) s + λL(t) u
Open Source Code	No	For Fix Match experiments, we base our implementation on Kim (2020), while all other experiments follow Wang et al. (2022a). (The references are to third-party code/benchmarks, not the authors' own implementation for this paper.)
Open Datasets	Yes	To corroborate our theoretical results, we evaluate SL, Fix Match, and SA-Fix Match on CIFAR100 (Krizhevsky et al., 2009), STL-10 (Coates et al., 2011), Imagewoof (Howard & Gugger, 2020), and Image Net (Deng et al., 2009).
Dataset Splits	Yes	For each experiment in Sec. 5, following Sohn et al. (2020); Zhang et al. (2021a); Xu et al. (2021); Wang et al. (2022b); Chen et al. (2023), we randomly select image-label pairs from the entire training dataset according to labeled data amount, set images from the whole training dataset without labels as unlabeled dataset, and we use the standard test dataset. The table below details data statistics across different datasets. Table 9: Summary of Datasets. Dataset Total Training Data Total Labeled Data in Training Data Test Data STL-10 105000 5000 8000 CIFAR-100 50000 50000 10000 Imagewoof 9025 9025 3929 Image Net 1281167 1281167 50000
Hardware Specification	Yes	All experiments are conducted on four RTX 3090 GPUs (24GB memory).
Software Dependencies	No	No specific versions of software dependencies (e.g., Python, PyTorch, CUDA) are mentioned in the paper, only that the optimizer is standard SGD. The implementation for FixMatch experiments is based on Kim (2020), which is a PyTorch implementation.
Experiment Setup	Yes	For hyper-parameters, we use the same setting following Fix Match (Sohn et al., 2020). Concretely, the optimizer for all experiments is standard stochastic gradient descent (SGD) with a momentum of 0.9 (Sutskever et al., 2013). For all datasets, we use an initial learning rate of 0.03 with a cosine learning rate decay schedule (Loshchilov & Hutter, 2016) as η = η0 cos 7πk 16K , where η0 is the initial learning rate, k is the current training step and K is the total training step that is set to 307200. We also perform an exponential moving average with the momentum of 0.999. The hyper-parameter settings are summarized in Table 10. Table 10: Complete hyper-parameter setting. Dataset CIFAR-100 STL-10 Imagewoof Image Net Model WRN-28-8 WRN-37-2 WRN-37-2 Res Net-50 Weight Decay 1e-3 5e-4 5e-4 3e-4 Batch Size 64 128 Unlabeled Data Raion µ 7 1 Threshold τ 0.95 0.7 Learning Rate η 0.03 SGD Momentum 0.9 EMA Momentum 0.999 Unsupervised Loss Weight λ 1