Towards an Explainable Comparison and Alignment of Feature Embeddings
Authors: Mohammad Jalali, Bahar Dibaei Nia, Farzan Farnia
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide numerical results demonstrating the SPEC s application to compare and align embeddings on large-scale datasets such as Image Net and MS-COCO. In this section, we first discuss the experimental settings and then apply the SPEC algorithm to compare different image and text embeddings across various large-scale datasets. |
| Researcher Affiliation | Academia | 1The Chinese University of Hong Kong 2Sharif University of Technology. Correspondence to: Mohammad Jalali <EMAIL>, Bahar Dibaei Nia <EMAIL>, Farzan Farnia <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Spectral Pairwise Embedding Comparison (SPEC) 1: Input: Sample set {x1, . . . , xn}, embeddings ψ1 and ψ2, kernel feature maps ϕ1 and ϕ2 2: Initialize Cψ1 = 0d1 d1, Cψ2 = 0d2 d2, Cψ1,ψ2 = 0d1 d2 3: for i {1, . . . , n} do 4: Update Cψ1 Cψ1 + 1 nϕ1(ψ1(xi))ϕ1(ψ1(xi)) 5: Update Cψ2 Cψ2 + 1 nϕ2(ψ2(xi))ϕ2(ψ2(xi)) 6: Update Cψ1,ψ2 Cψ1,ψ2+ 1 nϕ1(ψ1(xi))ϕ2(ψ2(xi)) 8: Construct Γψ1,ψ2 as in Equation (4) 9: Compute eigenvalues λ1:d1+d2 and eigenvectors v1:d1+d2 of non-symmetric matrix Γψ1,ψ2 10: for i {1, . . . , d1 + d2} do 11: Map eigenvector ui = ϕ1(ψ1(X)) ϕ2(ψ2(X)) vi 12: end for 13: Output: Eigenvalues λ1, . . . , λd1+d2, eigenvectors u1, . . . , ud1+d2. |
| Open Source Code | No | The project page is available at https://mjalali.github.io/SPEC/. (This is a project page, not explicitly a code repository. The instruction states that project demonstration pages or high-level project overview pages are not considered sufficient for 'Yes' unless they explicitly host the source code, which is not stated here.) |
| Open Datasets | Yes | In our experiments on image data, we used four datasets: AFHQ (Choi et al., 2020) (15K animal faces in categories of cats, wildlife, and dogs), FFHQ (Karras et al., 2019) (70K human-face images), Image Net-1K (Deng et al., 2009) (1.4 million images across 1,000 labels), and MS-COCO 2017 (Lin et al., 2015) ( 110K samples of diverse scenes with multiple objects). |
| Dataset Splits | No | In our experiments on image data, we used four datasets: AFHQ (Choi et al., 2020) (15K animal faces in categories of cats, wildlife, and dogs), FFHQ (Karras et al., 2019) (70K human-face images), Image Net-1K (Deng et al., 2009) (1.4 million images across 1,000 labels), and MS-COCO 2017 (Lin et al., 2015) ( 110K samples of diverse scenes with multiple objects). We used the Open CLIP Git Hub repository (link) and used the MS-COCO 2017 training set, which consists of 120K pairs of texts and images. (The paper lists datasets and mentions "training set" for MS-COCO and ImageNet but does not provide explicit details on how these datasets were split into training, validation, and test sets, or reference standard splits with sufficient detail.) |
| Hardware Specification | Yes | The experiments were performed on two RTX-4090 GPUs. |
| Software Dependencies | No | The paper mentions using 'Open CLIP Git Hub repository' and 'Pytorch s eig command' but does not specify version numbers for any key software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Parameter Value accum freq 1 alignment loss weight 0.1 batch size 128 clip alignment contrastive loss weight 0.9 coca contrastive loss weight 1.0 distributed True epochs 10 lr 1e-05 lr scheduler cosine model ViT-B-32 name Vit-B-32 laion2b e16 freeze 5 precision amp pretrained laion2b e16 seed 0 wd 0.2 |