Disentangling Representations through Multi-task Learning

Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve multi-task evidence accumulation classification tasks... We experimentally validate these predictions in RNNs trained on multi-task classification... 5 EXPERIMENTS
Researcher Affiliation Academia Pantelis Vafidis , Aman Bhargava Computation and Neural Systems California Institute of Technology EMAIL Antonio Rangel Humanities and Social Sciences California Institute of Technology EMAIL
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes mathematical equations for RNN dynamics (Equation 5) and illustrates a graphical model (Figure S7), but these are not presented as structured algorithms.
Open Source Code Yes All code used to generate the results can be found in https://github.com/panvaf/Disentangle Res.
Open Datasets No A ground truth x is sampled and Gaussian noise is added to arrive at X(t). The task is to report whether x lies above (1) or below (0) each of the classification lines (color matches corresponding boolean variable in y), given the noisy and non-linearly transformed samples f(X(1)), . . . , f(X(t)).
Dataset Splits Yes To quantify the disentanglement of the representations after learning, we evaluate regression generalization by training a linear decoder to predict the ground truth x while network weights are frozen. We perform out-of-distribution 4-fold cross-validation, i.e. train the decoder on 3 out of 4 quadrants and test in the remaining quadrant (Appendix A.2 for details).
Hardware Specification No The paper specifies the types of models used (RNNs, LSTMs, GPT-2 transformers) and their architectural details, but it does not provide specific hardware specifications like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The network is trained with a cross-entropy loss and Adam default settings, except learning rate η0 = 10 3, to produce the target outputs y(x ). While Adam is mentioned, no specific version number for Adam or any other software library (e.g., PyTorch, TensorFlow, CUDA) is provided.
Experiment Setup Yes Table S1 summarizes all hyperparameters and their values, which are shared across all architectures. PARAMETER VALUE EXPLANATION t 100 MS EULER INTEGRATION STEP SIZE τ 100 MS NEURONAL TIME CONSTANT Nneu 64 NUMBER OF HIDDEN NEURONS σ 0.2 INPUT NOISE STANDARD DEVIATION T 20 TRIAL DURATION (IN t S) η0 0.001/0.003 ADAM LEARNING RATE FIXED/FREE RT B 16 BATCH SIZE Nbatch 105 NUMBER OF TRAINING BATCHES D 2 DIMENSIONALITY OF LATENT SPACE Nlayer 1 RNN/LSTM NUMBER OF LAYERS