Editable Neural Networks
Authors: Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Pyrkin, Sergei Popov, Artem Babenko
ICLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness of this method on large-scale image classification and machine translation tasks. |
| Researcher Affiliation | Collaboration | Anton Sinitsin Yandex National Research University Higher School of Economics EMAIL Vsevolod Plokhotnyuk National Research University Higher School of Economics EMAIL Dmitry Pyrkin National Research University Higher School of Economics EMAIL Sergei Popov Yandex EMAIL Artem Babenko Yandex National Research University Higher School of Economics EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available online at https://github.com/xtinkt/editable |
| Open Datasets | Yes | First, we experiment on image classification with the small CIFAR-10 dataset with standard train/test splits (Krizhevsky et al.). ... Here we experiment with the ILSVRC image classification task (Deng et al. (2009)). ... We consider the IWSLT 2014 German-English translation task with the standard training/test splits (Cettolo et al. (2015)). |
| Dataset Splits | Yes | First, we experiment on image classification with the small CIFAR-10 dataset with standard train/test splits (Krizhevsky et al.). ... We measure the drawdown on the full ILSVRC validation set of 50.000 images. ... We consider the IWSLT 2014 German-English translation task with the standard training/test splits (Cettolo et al. (2015)). |
| Hardware Specification | Yes | In all cases Editable Fine-Tuning took under 48 hours on a single Ge Force 1080 Ti GPU while a single edit requires less than 150 ms. |
| Software Dependencies | Yes | We use Transformer configuration transformer iwslt de en from Fairseq v0.8.0 (Ott et al. (2019)) |
| Experiment Setup | Yes | All models trained on this dataset follow the Res Net-18 (He et al. (2015)) architecture and use the Adam optimizer (Kingma & Ba (2014)) with default hyperparameters. ... We set the learning rate to 10 5 for the pre-existing layers and 10 3 for the extra block. ... We use the SGD optimizer with momentum µ=0.9. ... We train the Transformer (Vaswani et al. (2017)) model similar to transformer-base configuration, optimized for IWSLT De-En task. |