reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zero-shot CLIP Class Forgetting via Text-image Space Adaptation

Authors: Alexey Kravets, Vinay P. Namboodiri

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We do a performance comparison in Tab. 1 showing that our method both outperforms the previous methods and is more robust to different visual encoders achieving perfect class forgetting with Vi T (Dosovitskiy et al., 2020) and Res Net (He et al., 2015). We analyse through ablations the importance of the retain and forget loss components in Section 7.2 and how forget class projection place in the image-text space affects the forgetting ability of the model in Section 7.5. We find that retaining the knowledge of non-forget classes requires the inclusion of semantically similar classes, which can be generated using a large language model (LLM). This is because projecting the forget class to a different space primarily affects the closest classes in the image-text embedding space, thus, it is important to preserve this part of the space, while nonsemantically similar classes are retained without explicit inclusion. We conduct a thorough ablation analysis on how the number of semantically similar classes affects performance in Section 7.3. Additionally, in Section 7.4 we assess how including semantically different classes affects performance.
Researcher Affiliation	Academia	Alexey Kravets EMAIL Department of Computer Science University of Bath Vinay P. Namboodiri EMAIL Department of Computer Science University of Bath
Pseudocode	No	The paper describes the methodology using mathematical equations and textual descriptions, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The abstract states: "Full implementation can be found here." However, a direct link or specific reference to supplementary material is not provided in the parsed text.
Open Datasets	Yes	Following (Kravets & Namboodiri, 2025) we evaluate CLIP s forgetting capabilities on four high-quality, fine-grained datasets: Caltech101 (Fei-Fei et al., 2007) contains images from 101 distinct categories, each representing various objects or scenes. Stanford Cars (Krause et al., 2013) contains images of cars of different makes and models. Oxford Flowers (Nilsback & Zisserman, 2008) includes images of flowers of 102 different classes. Stanford Dogs (Khosla et al., 2011) comprises 120 classes of dogs of different species. We use Pins Faces (Burak) dataset that contains 105 celebrity faces for this purpose. Burak. Pins face recognition dataset. URL kaggle.com/datasets/hereisburak/pins-face-recognition. ...taken from the Food101 (Bossard et al., 2014) dataset...
Dataset Splits	No	The paper mentions using well-known datasets for evaluation (Caltech101, Stanford Cars, Oxford Flowers, Stanford Dogs, Pins Faces, Food101) but does not explicitly state the specific train/test/validation splits used for their experiments or if standard splits from these datasets were uniformly applied. The text focuses on evaluation methodology and metrics rather than dataset partitioning.
Hardware Specification	No	The paper mentions: "We ran experiments using two versions of CLIP where either Res Net50 or Vi T-B/16 visual encoders were used." and "Acknowledgements We d like to gratefully acknowledge Microsoft s compute support through Microsoft s Accelerating Foundation Models Research grant and the support from University of Bath for the studentship." However, it does not provide specific details such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions using "Adam optimizer" and refers to "CLIP" as a model. However, it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup	Yes	We fix λ1 and λ3 while λ2 is is determined iteratively. At each iteration, we assess the reduction in the second component of the loss to evaluate whether the change in the projection matrix P is sufficient to project the forget class to the new chosen vector. We start from a fixed λ2 and increment it in small steps until the reduction in the second loss component exceeds 0.75% of its initial value. Additional implementation details are described in the Appendix. We ran experiments using two versions of CLIP where either Res Net50 or Vi T-B/16 visual encoders were used. For both the models we use the λ1 of 0.3, λ3 of 1 and a varying λ2 with initial value of 1.1 incrementing by 0.05 until the reduction in the second loss component exceeds 0.75% of its initial value. We optimize the low-ranking matrices A and B of rank r of 5 for 2000 iterations using Adam optimizer with learning rate of 0.01 and saving the weights that achieve the minimum loss.