Personalized Representation from Personalized Generation

Authors: Shobhita Sundaram, Julia Chae, Yonglong Tian, Sara Beery, Phillip Isola

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our learned representations for four downstream tasks: classification, retrieval, detection, and segmentation, and find that performance universally improves. We show that our method improves personalized representation learning for diverse downstream tasks, from recognition to segmentation, and analyze characteristics of image generation approaches that are key to this gain.
Researcher Affiliation Collaboration Shobhita Sundaram1 Julia Chae1 Yonglong Tian2 Sara Beery1 Phillip Isola1 1MIT 2Open AI
Pseudocode No The paper describes methods and pipelines in prose and with figures (e.g., Figure 2 for the training pipeline, Section 3.4 for the info NCE loss), but it does not contain any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Our website (https://personalized-rep.github.io/) and Github repository (https: //github.com/ssundaram21/personalized-rep) contain the source code for our work, including the necessary metadata to reproduce results, such as the LLM-generated captions used for dataset synthesis.
Open Datasets Yes We introduce a new dataset, PODS (Personal Object Discrimination Suite). PODS features common personal and household objects, enabling instance-level evaluation across classification, retrieval, detection, and segmentation tasks. We release our new dataset, PODS, and the reformulated Deep Fashion2 and Dog Face Net datasets.
Dataset Splits Yes All datasets are split such that for each object there are exactly 3 training images and at least 3 test images. Each dataset is randomly divided class-wise into a validation set (30 classes), and test set (varying size).
Hardware Specification Yes We report the wall-clock runtimes of synthetic data generation methods, using a single NVIDIA A100 GPU, in Table 3.
Software Dependencies Yes We generate personalized data from DR using Stable Diffusion 1.5, a T2I model, as our generator gθ. We adapt gθ using Dream Booth (Ruiz et al., 2022) to generate novel images of c when conditioned on an identifier token. Following prior works, we generate image captions with GPT-4 (Open AI, 2023).
Experiment Setup Yes We fine-tune via Low-Rank Adaptation (Lo RA), which is more parameter-efficient than full fine-tuning (Hu et al., 2021). We Lo RA finetune with the info NCE loss for 2 epochs over 4500 anchor-positive pairs, drawn from 450 synthetic positives and 1000 synthetic negatives. We use the following hyperparameters to Lo RA fine-tune each backbone: Learning rate: 0.0003, Batch size: 16, Lo RA rank: 16, Lo RA alpha: 0.5, Lo RA dropout: 0.3.