Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Authors: Lorenzo Basile, Santiago Acevedo, Luca Bortolussi, Fabio Anselmi, Alex Rodriguez
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds. |
| Researcher Affiliation | Academia | 1University of Trieste 2AREA Science Park 3SISSA 4MIT 5ICTP EMAIL |
| Pseudocode | No | The paper describes methods and formulas but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Our code is available at https://github.com/lorenzobasile/IDCorrelation. |
| Open Datasets | Yes | We feed our MLP with the MNIST (Le Cun et al., 1998) dataset at increasing degrees of nonlinearity... Moving to a more realistic setting, we test our method on measuring similarity between Image Net (Russakovsky et al., 2015) embeddings... We use three datasets, N24News (Wang et al., 2022), MS-COCO 2014 (Lin et al., 2014) and Flickr30k (Young et al., 2014), all consisting of imagecaption pairs. |
| Dataset Splits | Yes | for all our experiments we randomly sample a subset of 30000 data points... Specifically, we randomly shuffle the embeddings generated by a model while keeping the labels unchanged, and then compare this modified dataset with the original dataset prior to shuffling. In other words, given a point index i of class Ci, we pair it with another randomly chosen point j of class Cj with the condition that Ci = Cj. |
| Hardware Specification | Yes | We performed all the computations on a NVIDIA A100 GPU, equipped with 40GB of RAM. |
| Software Dependencies | No | Our implementation follows closely that of Ansuini et al. (2019), which we translated to Py Torch to enable GPU acceleration... We used them in their Py Torch implementations provided by Miranda (2021) (SVCCA), Maiorca (2024) (CKA) and Zhen et al. (2022) (Distance Correlation), with minor adaptations. Pretrained models were obtained from the Transformers library by Hugging Face (Wolf et al., 2020), details on the checkpoints we employed are provided in the Appendix (section A.8). |
| Experiment Setup | Yes | To illustrate this phenomenon, we showcase a simple example: we consider a randomly initialized multilayer perceptron (MLP), made of 15 fully connected layers of 784 neurons, followed by a Leaky Re LU activation... We feed our MLP with the MNIST (Le Cun et al., 1998) dataset at increasing degrees of nonlinearity (which corresponds to decreasing the slope) and compute the correlation between the representation at the final layer and the input data... To mitigate this, we assign a p-value to the observed correlation, employing a permutation test (Davison & Hinkley, 1997) on Id(X Y ). Specifically, we estimate the Id of several independent samples of the joint dataset, created by concatenating the two original datasets and randomizing the pairings to disrupt any existing correlations... where S is the total number of permuted samples considered (100 in most of our experiments). |