reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary

Authors: Takashi Morita

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Nonetheless, investigations through synthetic benchmarks reveal an advantage of coupling positional encoding and RNNs, especially for handling a large vocabulary that yields lowfrequency tokens. Further scrutinization unveils that these low-frequency tokens destabilizes the gradients of vanilla RNNs, and the positional encoding resolves this instability. These results shed a new light on the utility of positional encoding beyond its canonical role as a timekeeper for Transformers.
Researcher Affiliation	Academia	Takashi Morita EMAIL Academy of Emerging Sciences \| Center for Mathematical Science and AI Chubu University
Pseudocode	No	The paper describes methods in regular paragraph text and uses mathematical equations for definitions, but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All the experiments were implemented in Py Torch (ver. 2.1.1; Paszke et al., 2017; 2019) and each trainingtest trial was executed on a single NVIDIA A100 GPU (with 80GB VRAM) hosted by the Academic Center for Computing and Media Studies, Kyoto University. The source code is available in https://github.com/tkc-morita/position-encoded_rnn.
Open Datasets	Yes	This section reports benchmark results for the language modeling task. Single-layer LSTMs with and without sinusoidal positional encoding were trained and tested on the Wiki Text-103 dataset (Merity et al., 2017).
Dataset Splits	Yes	Each of the five trials held out 1024 random sequences (= 65,536 tokens) for computing the test accuracy.
Hardware Specification	Yes	All the experiments were implemented in Py Torch (ver. 2.1.1; Paszke et al., 2017; 2019) and each trainingtest trial was executed on a single NVIDIA A100 GPU (with 80GB VRAM) hosted by the Academic Center for Computing and Media Studies, Kyoto University.
Software Dependencies	Yes	All the experiments were implemented in Py Torch (ver. 2.1.1; Paszke et al., 2017; 2019)
Experiment Setup	Yes	The models were trained for 300,000 iterations using the Adam optimizer (Kingma & Ba, 2015) with the parameters (β1, β2) := (0.9, 0.999) and no weight decay. The learning rate was linearly warmed up from 0.0 to 0.001 for the first 1,000 iterations, and then annealed according to the cosine schedule (Loshchilov & Hutter, 2017). The batch size was 512.