Evolving Decomposed Plasticity Rules for Information-Bottlenecked Meta-Learning

Authors: Fan Wang, Hao Tian, Haoyi Xiong, Hua Wu, Jie Fu, Yang Cao, Yu Kang, Haifeng Wang

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our algorithms are tested in challenging random 2D maze environments, where the agents have to use their past experiences to shape the neural connections and improve their performances for the future. The results of our experiment validate the following
Researcher Affiliation Collaboration 1Baidu Inc. 2University of Science and Technology of China 3Beijing Academy of Artificial Intelligence
Pseudocode Yes Algorithm 1 Inner-Loop Learning
Open Source Code Yes source code available at https://github.com/WorldEditors/EvolvingPlasticANN
Open Datasets Yes We validate the proposed method in Meta Maze2D (Wang, 2021), an open-source maze simulator that can generate maze architectures, start positions, and goals at random.
Dataset Splits Yes For meta-training, each generation includes g = 360 genotypes evaluated on |Ttra| = 12 tasks. ... Every 100 generations we add a validating phase by evaluating the current genotype in |Tvalid| = 1024 (validating tasks). ... The testing tasks include 9x9 mazes (Figure 3 (a)), 15x15 mazes (Figure 3 (b)), and 21x21 mazes (Figure 3 (c)) sampled in advance. There are |Ttst| = 2048 tasks for each level of mazes.
Hardware Specification Yes The genotypes are distributed to 360 CPUs to execute the inner loops.
Software Dependencies No The paper mentions a simulator 'Meta Maze2D (Wang, 2021)' and other general concepts like 'plastic RNN', 'LSTM', but does not specify versions for any programming languages, libraries, or frameworks used in the implementation.
Experiment Setup Yes For meta-training, each generation includes g = 360 genotypes evaluated on |Ttra| = 12 tasks. ... The variance of the noises in Seq-CMA-ES is initially set to be 0.01. ... Meta training goes for at least 15,000 generations... The agents acquire the reward of 1.0 by reaching the goal and 0.01 in other cases. Each episode terminates when reaching the goal, or at the maximum of 200 steps. A life cycle has totally 8 episodes. ... For the outer-loop optimizer (seq-CMA-ES), we used an initial step size of 0.01, and the covariance C = I for all the compared methods.