The Double-Ellipsoid Geometry of CLIP
Authors: Meir Yossef Levi, Guy Gilboa
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our statistical analysis and many experimental results are based on MS-COCO (Lin et al., 2014) validation set, a common standard image-text dataset. In Fig. 2, normalized histograms are shown for features 93, 134 and 494 of the CLIP latent vector. To empirically analyze the uniformity and alignment terms in Eq. 8 alongside the overall loss in Eq. 7, we use the MS-COCO validation set. The results show that the loss for correctly classified samples decreases monotonically with the shift toward the origin. |
| Researcher Affiliation | Academia | 1Viterbi Faculty of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Meir Yossef Levi <EMAIL>, Guy Gilboa <EMAIL>. |
| Pseudocode | No | The paper describes methods and analyses using mathematical equations and prose, but it does not contain any explicitly labeled pseudocode blocks or algorithms in a structured, code-like format. |
| Open Source Code | No | The paper references existing frameworks and methods (e.g., "The un CLIP framework (Ramesh et al., 2022)", "text inversion (Han et al., 2024; Gal et al., 2022; Mokady et al., 2023)"), but it does not contain any statement from the authors about releasing their own source code for the methodology described in this paper, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Our statistical analysis and many experimental results are based on MS-COCO (Lin et al., 2014) validation set, a common standard image-text dataset. We provide additional visualizations of highand low-conformity images across various datasets. Figure 19 illustrates examples of sketches from Image Net-R, while Figure 20 showcases examples from Image Net-A. |
| Dataset Splits | Yes | Our statistical analysis and many experimental results are based on MS-COCO (Lin et al., 2014) validation set, a common standard image-text dataset. We treat the entire validation set (5k samples) as a single batch. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | No | The paper discusses the normalized temperature-scaled cross entropy loss (NT-Xent) used in CLIP's training and analyzes its behavior, including varying a parameter 'alpha' for analytical purposes. However, it does not specify concrete hyperparameters like learning rates, batch sizes, number of epochs, or other system-level training settings for their own experiments or for reproducing the analyzed CLIP model. |