A Unifying Framework for Representation Learning
Authors: Shaden Alshammari, John Hershey, Axel Feldmann, William Freeman, Mark Hamilton
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We not only present a wide array of proofs, connecting over 23 different approaches, but we also leverage these theoretical results to create state-of-the-art unsupervised image classifiers that achieve a +8% improvement over the prior state-of-the-art on unsupervised classification on Image Net-1K. ... We evaluate the I-Con framework using the Image Net-1K dataset (Deng et al., 2009), which consists of 1,000 classes and over one million high-resolution images. This dataset is considered one of the most challenging benchmarks for unsupervised image classification due to its scale and complexity. To ensure a fair comparison with prior works, we strictly adhere to the experimental protocol introduced by (Adaloglou et al., 2023). The primary metric for evaluating clustering performance is Hungarian accuracy... |
| Researcher Affiliation | Collaboration | Shaden Alshammari1 John Hershey2 Axel Feldmann1 William T. Freeman1,2 Mark Hamilton1,3 1 MIT 2 Google 3 Microsoft |
| Pseudocode | No | Figure 3 provides "code-style configurations" for SNE, Sim CLR, and K-Means, which illustrate how these methods can be expressed using the I-Con framework's components (e.g., SNE_model = ICon(...)). These are not clearly labeled pseudocode or algorithm blocks describing the I-Con algorithm itself, but rather configuration examples. |
| Open Source Code | Yes | https://aka.ms/i-con |
| Open Datasets | Yes | We evaluate the I-Con framework using the Image Net-1K dataset (Deng et al., 2009)... We use I-Con to design a debiasing strategy that improves unsupervised Image Net-1K accuracy by +8%, with additional gains of +3% on CIFAR-100 and +2% on STL-10 in linear probing. ... The models were trained on the CIFAR-10 dataset for 1000 epochs... |
| Dataset Splits | Yes | We evaluate the I-Con framework using the Image Net-1K dataset (Deng et al., 2009)... To ensure a fair comparison with prior works, we strictly adhere to the experimental protocol introduced by (Adaloglou et al., 2023). ... The models were trained on the CIFAR-10 dataset for 1000 epochs... For evaluation, we used two methods: (1) linear probing on the 512-dimensional embeddings from the MLP s hidden layer, and (2) k-nearest neighbors (k = 3) classification based on the same embeddings for CIFAR-10 (in-distribution) and CIFAR-100 (out-of-distribution). |
| Hardware Specification | No | The paper mentions using DiNO pre-trained Vision Transformer (Vi T) models and different sized backbones (Vi T-S/14, Vi T-B/14, and Vi T-L/14) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "ADAM (Kingma & Ba, 2017)" as an optimizer. However, it does not specify version numbers for any programming languages, libraries, or other software components used in the experiments. |
| Experiment Setup | Yes | The training process involved optimizing a linear classifier on top of the features extracted by the Di NO models. Each model was trained for 30 epochs, using ADAM (Kingma & Ba, 2017) with a batch size of 4096 and an initial learning rate of 1e-3. We decayed the learning rate by a factor of 0.5 every 10 epochs to allow for stable convergence. We do not apply additional normalization to the feature vectors. |