reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Disentangling Representations through Multi-task Learning

Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve multi-task evidence accumulation classification tasks... We experimentally validate these predictions in RNNs trained on multi-task classification... 5 EXPERIMENTS
Researcher Affiliation	Academia	Pantelis Vafidis , Aman Bhargava Computation and Neural Systems California Institute of Technology EMAIL Antonio Rangel Humanities and Social Sciences California Institute of Technology EMAIL
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes mathematical equations for RNN dynamics (Equation 5) and illustrates a graphical model (Figure S7), but these are not presented as structured algorithms.
Open Source Code	Yes	All code used to generate the results can be found in https://github.com/panvaf/Disentangle Res.
Open Datasets	No	A ground truth x is sampled and Gaussian noise is added to arrive at X(t). The task is to report whether x lies above (1) or below (0) each of the classification lines (color matches corresponding boolean variable in y), given the noisy and non-linearly transformed samples f(X(1)), . . . , f(X(t)).
Dataset Splits	Yes	To quantify the disentanglement of the representations after learning, we evaluate regression generalization by training a linear decoder to predict the ground truth x while network weights are frozen. We perform out-of-distribution 4-fold cross-validation, i.e. train the decoder on 3 out of 4 quadrants and test in the remaining quadrant (Appendix A.2 for details).
Hardware Specification	No	The paper specifies the types of models used (RNNs, LSTMs, GPT-2 transformers) and their architectural details, but it does not provide specific hardware specifications like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The network is trained with a cross-entropy loss and Adam default settings, except learning rate η0 = 10 3, to produce the target outputs y(x ). While Adam is mentioned, no specific version number for Adam or any other software library (e.g., PyTorch, TensorFlow, CUDA) is provided.
Experiment Setup	Yes	Table S1 summarizes all hyperparameters and their values, which are shared across all architectures. PARAMETER VALUE EXPLANATION t 100 MS EULER INTEGRATION STEP SIZE τ 100 MS NEURONAL TIME CONSTANT Nneu 64 NUMBER OF HIDDEN NEURONS σ 0.2 INPUT NOISE STANDARD DEVIATION T 20 TRIAL DURATION (IN t S) η0 0.001/0.003 ADAM LEARNING RATE FIXED/FREE RT B 16 BATCH SIZE Nbatch 105 NUMBER OF TRAINING BATCHES D 2 DIMENSIONALITY OF LATENT SPACE Nlayer 1 RNN/LSTM NUMBER OF LAYERS