Automatically Composing Representation Transformations as a Means for Generalization
Authors: Michael Chang, Abhishek Gupta, Sergey Levine, Thomas L. Griffiths
ICLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show on a symbolic and a high-dimensional domain that our compositional approach can generalize to more complex problems than the learner has previously encountered, whereas baselines that are not explicitly compositional do not. |
| Researcher Affiliation | Academia | Michael B. Chang Electrical Engineering and Computer Science University of California, Berkeley, USA EMAIL Abhishek Gupta Electrical Engineering and Computer Science University of California, Berkeley, USA EMAIL Sergey Levine Electrical Engineering and Computer Science University of California, Berkeley EMAIL Thomas L. Griffiths Psychology and Cognitive Science Princeton University, USA EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/mbchang/crl |
| Open Datasets | Yes | recognizing spatially transformed MNIST digits (Le Cun et al., 1998) |
| Dataset Splits | Yes | We randomly choose 16 of these 20 for training, 2 for validation, 2 for test, as shown in Figure 4 (center). |
| Hardware Specification | No | The paper mentions 'computing support from Amazon, NVIDIA, and Google' but does not specify exact hardware models (e.g., GPU, CPU models, or specific cloud instances with their specs). |
| Software Dependencies | No | All learners are implemented in Py Torch (Paszke et al., 2017) but no specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | The loss is backpropagated through the modules, which are trained with Adam (Kingma & Ba, 2014). The controller receives a sparse reward derived from the loss at the end of the computation, and a small cost for each computational step. The model is trained with proximal policy optimization (Schulman et al., 2017). We found via a grid search k = 1024 and k = 256. Through an informal search whose heuristic was performance on the training set, we settled on updating the curriculum of CRL every 10^5 episodes and updating the curriculum of the RNN every 5 × 10^4 episodes. The step penalty was found by a scale search over {−1, 0.1, 0.01, 0.001} and 0.01 was a penalty that we found balanced accuracy and computation time to a reasonable degree during training. |