A Unifying Framework for Representation Learning

Authors: Shaden Alshammari, John Hershey, Axel Feldmann, William Freeman, Mark Hamilton

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We not only present a wide array of proofs, connecting over 23 different approaches, but we also leverage these theoretical results to create state-of-the-art unsupervised image classifiers that achieve a +8% improvement over the prior state-of-the-art on unsupervised classification on Image Net-1K. ... We evaluate the I-Con framework using the Image Net-1K dataset (Deng et al., 2009), which consists of 1,000 classes and over one million high-resolution images. This dataset is considered one of the most challenging benchmarks for unsupervised image classification due to its scale and complexity. To ensure a fair comparison with prior works, we strictly adhere to the experimental protocol introduced by (Adaloglou et al., 2023). The primary metric for evaluating clustering performance is Hungarian accuracy...
Researcher Affiliation Collaboration Shaden Alshammari1 John Hershey2 Axel Feldmann1 William T. Freeman1,2 Mark Hamilton1,3 1 MIT 2 Google 3 Microsoft
Pseudocode No Figure 3 provides "code-style configurations" for SNE, Sim CLR, and K-Means, which illustrate how these methods can be expressed using the I-Con framework's components (e.g., SNE_model = ICon(...)). These are not clearly labeled pseudocode or algorithm blocks describing the I-Con algorithm itself, but rather configuration examples.
Open Source Code Yes https://aka.ms/i-con
Open Datasets Yes We evaluate the I-Con framework using the Image Net-1K dataset (Deng et al., 2009)... We use I-Con to design a debiasing strategy that improves unsupervised Image Net-1K accuracy by +8%, with additional gains of +3% on CIFAR-100 and +2% on STL-10 in linear probing. ... The models were trained on the CIFAR-10 dataset for 1000 epochs...
Dataset Splits Yes We evaluate the I-Con framework using the Image Net-1K dataset (Deng et al., 2009)... To ensure a fair comparison with prior works, we strictly adhere to the experimental protocol introduced by (Adaloglou et al., 2023). ... The models were trained on the CIFAR-10 dataset for 1000 epochs... For evaluation, we used two methods: (1) linear probing on the 512-dimensional embeddings from the MLP s hidden layer, and (2) k-nearest neighbors (k = 3) classification based on the same embeddings for CIFAR-10 (in-distribution) and CIFAR-100 (out-of-distribution).
Hardware Specification No The paper mentions using DiNO pre-trained Vision Transformer (Vi T) models and different sized backbones (Vi T-S/14, Vi T-B/14, and Vi T-L/14) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions using "ADAM (Kingma & Ba, 2017)" as an optimizer. However, it does not specify version numbers for any programming languages, libraries, or other software components used in the experiments.
Experiment Setup Yes The training process involved optimizing a linear classifier on top of the features extracted by the Di NO models. Each model was trained for 30 epochs, using ADAM (Kingma & Ba, 2017) with a batch size of 4096 and an initial learning rate of 1e-3. We decayed the learning rate by a factor of 0.5 every 10 epochs to allow for stable convergence. We do not apply additional normalization to the feature vectors.