reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recurrent networks, hidden states and beliefs in partially observable environments

Authors: Gaspard Lambrechts, Adrien Bolland, Damien Ernst

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we show empirically that recurrent neural networks trained to approximate such value functions internally filter the posterior probability distribution of the current state given the history, called the belief. More precisely, we show that, as a recurrent neural network learns the Q-function, its hidden states become more and more correlated with the beliefs of state variables that are relevant to optimal control. This investigation is conducted in this work by studying the performance of the different agents with regard to the mutual information (MI) between their hidden states and the belief. Section 4 displays the main results obtained for the previously mentioned POMDPs.
Researcher Affiliation	Academia	Gaspard Lambrechts EMAIL Montefiore Institute, University of Liège; Adrien Bolland EMAIL Montefiore Institute, University of Liège; Damien Ernst EMAIL Montefiore Institute, University of Liège LTCI, Telecom Paris, Institut Polytechnique de Paris. All listed institutions are academic.
Pseudocode	Yes	The DRQN training procedure is detailed in Algorithm 1. This process, illustrated in Algorithm 2, guarantees that the successive sets S0, . . . , SH have (weighted) samples following the probability distribution b0, . . . , b H defined by equation (8). The MINE algorithm proposes to maximise iϕ(X; Y ) by stochastic gradient ascent over batches from the two sets of samples, as detailed in Algorithm 3.
Open Source Code	No	The information is not present in the paper. There are no explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	Yes	We focus on POMDPs for which the models are known. The benchmark problems chosen are the T-Maze environments (Bakker, 2001) and the Mountain Hike environments (Igl et al., 2018). These are standard and well-cited environments.
Dataset Splits	No	The paper describes reinforcement learning environments (T-Maze and Mountain Hike) where data is generated through interaction, rather than using fixed datasets with predefined train/test/validation splits. Therefore, the concept of specific dataset splits in the traditional supervised learning sense is not applicable or provided.
Hardware Specification	No	Computational resources have been provided by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and by the Walloon Region. This statement mentions the computing facility but does not provide specific hardware details like GPU/CPU models, processors, or memory specifications.
Software Dependencies	No	The parameters θ are updated with the Adam algorithm (Kingma & Ba, 2014). The paper mentions the Adam optimizer but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation.
Experiment Setup	Yes	The hyperparameters of the DRQN algorithm are given in Table 1 and the hyperparameters of the MINE algorithm are given in Table 2. These tables specify values for RNN layers, hidden state size, replay buffer capacity, target update period, exploration rate, batch size, Adam learning rate, number of epochs, etc.