reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Denoising Pretrained Black-box Models via Amplitude-Guided Phase Realignment

Authors: Hongliang Ni, Tong Chen, Shazia Sadiq, Gianluca Demartini

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a variety of popular pre-trained vision and language models suggest that, even with a simple linear classifier, our method can enhance downstream performance across a range of in-domain and out-of-domain tasks.
Researcher Affiliation	Academia	Hongliang Ni EMAIL School of Electrical Engineering and Computer Science University of Queensland ARC Training Centre for Information Resilience
Pseudocode	Yes	Algorithm 1: Amplitude-Guided Phase Realignment Classifier (Lorem)
Open Source Code	No	The paper does not provide an explicit statement of code release or a link to a code repository for the methodology described.
Open Datasets	Yes	Datasets. We validate our model on seven in-domain (ID) vision tasks and two out-of-domain (OOD) vision tasks. For the ID tasks, we use seven downstream datasets, including CIFAR-10/100 (Krizhevsky et al., 2009), Caltech101 (Fei-Fei et al., 2004), Food101 (Bossard et al., 2014), Euro SAT (Helber et al., 2019), RESISC45 (Cheng et al., 2017), and Stanford Cars (Krause et al., 2013). For the OOD tasks, we use the real subset of Domain Net (Peng et al., 2019) as the training set and the sketch subset as the testing set, and vice versa. For text tasks, we validate our model on GLUE (Wang et al., 2018) and GLUE-X (Yang et al., 2022), for both ID and OOD evaluation.
Dataset Splits	Yes	We use the same datasets and identical train validation test splits, reporting the average accuracy across five runs on each dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' but does not specify its version or other key software dependencies with version numbers.
Experiment Setup	Yes	We train each downstream classifier for 30 epochs using the Adam optimizer. The learning rate is set to 0.001 for the other baselines and 0.0001 for our proposed method. [...] The hyperparameter sensitivity analysis of global scaling factor ε in Lorem is presented here, where we evaluate on three representative in-distribution (ID) vision datasets, which are CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Food 101 (Bossard et al., 2014).