reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Selective Label Enhancement Learning for Test-Time Adaptation

Authors: Yihao Hu, Congyu Qiao, Xin Geng, Ning Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various benchmark datasets validate the effectiveness of the proposed approach. The source code is available at https://github.com/palm-ml/PASLE. We employ four domain generalization datasets including PACS (Li et al., 2017a), VLCS (Torralba & Efros, 2011), Office Home (Venkateswara et al., 2017), and Domain Net (Peng et al., 2019).
Researcher Affiliation	Academia	1 School of Computer Science and Engineering, Southeast University, Nanjing, China 2 Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China EMAIL
Pseudocode	Yes	Algorithm 1 PASLE Algorithm
Open Source Code	Yes	The source code is available at https://github.com/palm-ml/PASLE.
Open Datasets	Yes	We employ four domain generalization datasets including PACS (Li et al., 2017a), VLCS (Torralba & Efros, 2011), Office Home (Venkateswara et al., 2017), and Domain Net (Peng et al., 2019). Additionally, we employ two image corruption datasets: CIFAR-10-C and CIFAR-100-C (Hendrycks & Dietterich, 2019). Both datasets introduce 15 types of common image corruptions, categorized into four types: noise, blur, weather, and digital, to the test sets of CIFAR-10 and CIFAR-100 (Krizhevsky, 2009).
Dataset Splits	Yes	For source training, we designate one domain as the target and use the remaining domains as source domains. We allocated 20% of the data from the source domains for validation purposes. We use the training sets of CIFAR-10 and CIFAR-100 as source domains and the highest level of corruption in CIFAR-10-C and CIFAR-100-C as target domains.
Hardware Specification	Yes	experiments were carried out on the clipart domain of the Domain Net dataset, using Res Net-18 as the backbone with a batch size of 128 on an NVIDIA TITAN Xp GPU.
Software Dependencies	No	The paper mentions using Res Net-18 and Res Net-50 models, Adam optimizer, ImageNet-1K pre-trained models, and batch normalization, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For source training, the models are trained using the Adam optimizer with a learning rate of 5e 5 for domain generalization benchmarks and 1e 3 for image corruption benchmarks. All weights are initialized from Image Net-1K (Russakovsky et2 al., 2015) pre-trained models. During testing, we also utilize the Adam optimizer to update all trainable layers without the need for a specific selection. The batch size for the online target domain data is set to 128, with the buffer capacity K set to one-fourth of the batch size, i.e., 32. The learning rate is selected from the range between 1e 3 and 1e 6. The value of τstart is determined by the number of classes in each dataset: for example, VLCS contains 5 classes, while Domain Net has 345 classes, leading to different τstart values for each dataset. The threshold gap, represented as \|τstart τend\|, is consistently set at 0.1. Furthermore, τdes is uniformly set to 1e 3 for all datasets, except for the large-scale dataset Domain Net, where it is adjusted to 1e 4.