reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Whitened CLIP as a Likelihood Surrogate of Images and Captions

Authors: Roy Betser, Meir Yossef Levi, Guy Gilboa

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present several preliminary experiments demonstrating the properties and applicability of these likelihood scores to images and captions. All the experiments in this section employ the CLIP Vi TL/14 model and utilize the MS-COCO validation set to compute the whitening matrix W.
Researcher Affiliation	Academia	1Viterbi Faculty of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel.. Correspondence to: Roy Betser <EMAIL>, Meir Yossef Levi <EMAIL>, Guy Gilboa <EMAIL>.
Pseudocode	Yes	Algorithm 1 Whitening Process Input: Dataset X RN d, correlation threshold τ Output: Whitening matrix W Step 1: Compute Correlation Matrix. Calculate the correlation matrix: Cij = Cov(Xi, Xj) Step 2: Remove Highly Correlated Features. Identify feature pairs (i, j) where \|Cij\| > τ. For each pair, remove one feature (e.g., j) and replace it with random noise r, Denote the updated dataset as X : r N(0, 0.1). Step 3: Compute Covariance Matrix. Calculate the covariance matrix: Σ = 1 N X T X Step 4: Perform Eigenvalue Decomposition. Decompose Σ into eigenvalues Λ and eigenvectors V : Σ = V ΛV T Step 5: Compute Whitening Matrix and Transform Data. Calculate the whitening matrix: W = Λ 1/2V T , where Λ 1/2 is a diagonal matrix with elements given by the inverse square root of the eigenvalues: (Λ 1/2 )ii = 1 p λi
Open Source Code	Yes	Our code is available here. Our code along detailed instructions is available HERE.
Open Datasets	Yes	The 5000 embeddings of MS-COCO validation set (Lin et al., 2014) are divided into 20 equal groups of 250 samples each. Fig. 3 evaluates a subset of Image Net (Deng et al., 2009), as presented in Kan et al. (2018), in comparison to Image Net-A (Hendrycks et al., 2021b), Image Net-C (Hendrycks & Dietterich, 2019), and Image Net R (Hendrycks et al., 2021a). Flickr8k (Hodosh et al., 2013), similarly to MS-COCO, is a benchmark for image-captioning tasks that emphasizes real-world imagery and descriptive diversity. We sampled 5,000 sentences from Open Web Text (Gokaslan et al., 2019), a general text dataset...
Dataset Splits	Yes	For stability, the 5000 embeddings of MS-COCO validation set (Lin et al., 2014) are divided into 20 equal groups of 250 samples each. For each size (1k, 2k, 3k, 4k) we randomly sampled 5 subsets of MS-COCO validation set.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cluster specifications) are mentioned in the paper for running experiments.
Software Dependencies	No	The paper mentions several models and frameworks (e.g., CLIP Vi TL/14, GPT-2, Un CLIP) but does not provide specific version numbers for any ancillary software dependencies like Python, PyTorch, or other libraries.
Experiment Setup	No	The paper describes the models used (e.g., CLIP Vi TL/14, Un CLIP) and the datasets (MS-COCO, ImageNet), along with specific processing steps like whitening and normalization, but it does not specify concrete hyperparameters like learning rates, batch sizes, or number of epochs for training any models used in the experiments.