Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints

Authors: Yuxuan Wu, Ziyu Wang, Bhiksha Raj, Gus Xia

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that V3 generalizes across multiple domains and modalities, successfully learning disentangled content and style representations... Experimental results show that our approach achieves more robust content-style disentanglement than unsupervised baselines, and outperforms even supervised methods... We evaluate V3 on both synthetic and real data...
Researcher Affiliation Academia Mohamed bin Zayed University of Artificial Intelligence New York University Shanghai Carnegie Mellon University
Pseudocode No The paper describes the model architecture and methodology using text, equations, and diagrams (Figure 2), but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes 1Code is available at https://github.com/Irislucent/variance-versus-invariance. Demo can be found at https://v3-content-style.github.io/V3-demo/.
Open Datasets Yes Street View House Numbers (SVHN) (Netzer et al., 2011)... Sprites with Actions Dataset (Sprites) (ope; Yingzhen and Mandt, 2018)... Librispeech Clean 100 Hours (Libri100) (Panayotov et al., 2015)
Dataset Splits Yes The dataset is split into the train set, validation set, and test set with a ratio of 8:1:1. ... We split the extra partition into additional training, validation and testing sets with a ratio of 8:1:1. ... We use 80% of the characters for training and the rest for validation and testing.
Hardware Specification Yes All models are trained on a single Nvidia RTX 4090 GPU.
Software Dependencies No The paper mentions using 'Adam optimizer' and refers to several models like 'Res Net18' but does not specify versions for any programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes For all models, we use the Adam optimizer with a learning rate of 0.001 (Kingma and Ba, 2014). The fragment sizes on Phone Nums, Ins Notes, SVHN and Sprites are set to 10, 12, 2 and 6, respectively. The relativity r is set to 15, 15, 5, 10 and 5 on Phone Nums, Ins Notes, SVHN, Sprites and Libri100, respectively... The V3 loss weight β is defaultly set to 1 on Ins Notes task, and 0.1 in other datasets. The commitment loss weight α is set to 0.01.