Disentangling Representations through Multi-task Learning
Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve multi-task evidence accumulation classification tasks... We experimentally validate these predictions in RNNs trained on multi-task classification... 5 EXPERIMENTS |
| Researcher Affiliation | Academia | Pantelis Vafidis , Aman Bhargava Computation and Neural Systems California Institute of Technology EMAIL Antonio Rangel Humanities and Social Sciences California Institute of Technology EMAIL |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes mathematical equations for RNN dynamics (Equation 5) and illustrates a graphical model (Figure S7), but these are not presented as structured algorithms. |
| Open Source Code | Yes | All code used to generate the results can be found in https://github.com/panvaf/Disentangle Res. |
| Open Datasets | No | A ground truth x is sampled and Gaussian noise is added to arrive at X(t). The task is to report whether x lies above (1) or below (0) each of the classification lines (color matches corresponding boolean variable in y), given the noisy and non-linearly transformed samples f(X(1)), . . . , f(X(t)). |
| Dataset Splits | Yes | To quantify the disentanglement of the representations after learning, we evaluate regression generalization by training a linear decoder to predict the ground truth x while network weights are frozen. We perform out-of-distribution 4-fold cross-validation, i.e. train the decoder on 3 out of 4 quadrants and test in the remaining quadrant (Appendix A.2 for details). |
| Hardware Specification | No | The paper specifies the types of models used (RNNs, LSTMs, GPT-2 transformers) and their architectural details, but it does not provide specific hardware specifications like GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The network is trained with a cross-entropy loss and Adam default settings, except learning rate η0 = 10 3, to produce the target outputs y(x ). While Adam is mentioned, no specific version number for Adam or any other software library (e.g., PyTorch, TensorFlow, CUDA) is provided. |
| Experiment Setup | Yes | Table S1 summarizes all hyperparameters and their values, which are shared across all architectures. PARAMETER VALUE EXPLANATION t 100 MS EULER INTEGRATION STEP SIZE τ 100 MS NEURONAL TIME CONSTANT Nneu 64 NUMBER OF HIDDEN NEURONS σ 0.2 INPUT NOISE STANDARD DEVIATION T 20 TRIAL DURATION (IN t S) η0 0.001/0.003 ADAM LEARNING RATE FIXED/FREE RT B 16 BATCH SIZE Nbatch 105 NUMBER OF TRAINING BATCHES D 2 DIMENSIONALITY OF LATENT SPACE Nlayer 1 RNN/LSTM NUMBER OF LAYERS |