Sharing Knowledge for Meta-learning with Feature Descriptions

Authors: Tomoharu Iwata, Atsutoshi Kumagai

NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we demonstrate that the proposed method achieves better predictive performance than the existing meta-learning methods using a wide variety of real-world datasets provided by the statistical office of the EU and Japan.
Researcher Affiliation Industry Tomoharu Iwata NTT Communication Science Laboratories Kyoto, Japan EMAIL Atsutoshi Kumagai NTT Computer and Data Science Laboratories Tokyo, Japan EMAIL
Pseudocode Yes Algorithm 1 Meta-learning procedure of our model.
Open Source Code No The code is proprietary.
Open Datasets Yes For evaluating the proposed method, we used the following two data: e-Stat and Eurostat. ... The e-Stat data were obtained from the official statistics of Japan using API 1. The Eurostat data were obtained from the statistical office of the European Union using API 2. (Footnotes 1 and 2 provide URLs: 1https://www.e-stat.go.jp/en. 2https://ec.europa.eu/eurostat/data/database.)
Dataset Splits Yes For each of the e-Stat and Eurostat data, we sampled 700 datasets for meta-training, 100 for meta-validation, and 200 for meta-test.
Hardware Specification No Table 7 shows the average computation time in seconds for meta-training on computers with 2.60GHz CPUs. This is not a specific enough hardware detail (e.g., no CPU model or other components).
Software Dependencies No We implemented the proposed method with Py Torch [15]. This mentions PyTorch but does not specify a version number.
Experiment Setup Yes In the proposed model, we used a three-layered feed-forward neural network with 128 hidden and output units for the sentence, feature, and instance encoders, f SE, f FE, and f IE, and a three-layered feed-forward neural network with 128 hidden units and a single output unit for mean function g. For the activation function, we used rectified linear unit Re LU(x) = max(0, x). For GP, we used RBF kernels, k(z, z ) = α exp γ ||z - z ||^2 + βδ(z, z ), where α, β, and γ were kernel parameters to be meta-trained. We optimized our models using Adam [10] with learning rate 10^-3, batch dataset size 32, and dropout rate 0.1 [22]. The meta-validation datasets were used for early stopping, for which the maximum number of meta-training epochs was 5,000.