Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Gaussian Approximation for Bias Reduction in Q-Learning

Authors: Carlo D'Eramo, Andrea Cini, Alessandro Nuara, Matteo Pirotta, Cesare Alippi, Jan Peters, Marcello Restelli

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically evaluate our algorithms in a large set of heterogeneous problems, encompassing discrete and continuous, low and high dimensional, deterministic and stochastic environments. Experimental results show the effectiveness of the Weighted Estimator in controlling the bias of the estimate, resulting in better performance than representative baselines and robust learning w.r.t. a large set of diverse environments. Section 6 is dedicated to the extensive evaluation of our RL methodologies, where we show and discuss the empirical evidence of the benefit of WE w.r.t. ME, DE, and MME.
Researcher Affiliation Collaboration 1TU Darmstadt, Darmstadt, Germany 2The Swiss AI Lab IDSIA, Lugano, Switzerland 3Universit a della Svizzera italiana, Lugano, Switzerland 4Politecnico di Milano, Milano, Italy 5Facebook AI Research
Pseudocode Yes Algorithm 1 Weighted Q-learning, Algorithm 2 Weighted FQI (finite actions), Algorithm 3 Weighted FQI (continuous actions), Algorithm 4 Weighted DQN
Open Source Code Yes The algorithms and the experimental setup have been developed using the open-source RL libraries Mushroom RL5 (D Eramo et al., 2021) and Open AI Gym (Brockman et al., 2016). Full description of the complete experimental setup and further results can be found in the appendix. ... 5. Open source code at https://github.com/Mushroom RL/mushroom-rl.
Open Datasets Yes First we run a proof of concept experiment on the Lunar Lander environment (Brockman et al., 2016). Then we test WDQN in three environments of the Minatar benchmark (Young and Tian, 2019). ... Finally we perform a set of experiments on two Atari games from the Arcade Learning Environment (ALE) (Bellemare et al., 2013)...
Dataset Splits No The paper describes reinforcement learning experiments where data is collected through interaction with environments (Lunar Lander, Minatar, Atari games). It details training protocols, such as episode length limits and evaluation frequencies, but does not specify static training/validation/test dataset splits in terms of percentages, sample counts, or predefined partitioned files, as is typical for supervised learning tasks. Data is generated dynamically by the agent's interaction with the environment and stored in a replay memory.
Hardware Specification Yes The experiments were run on a server with an Intel Xeon Silver 4116 CPU with NVIDIA Titan V GPUs.
Software Dependencies No For the implementation of the algorithms and the simulation environments we rely on the following open-source libraries: numpy (Harris et al., 2020); Mushroom RL (D Eramo et al., 2021); Gym (Brockman et al., 2016); ALE (Bellemare et al., 2013); Min Atar (Young and Tian, 2019); Py Torch (Paszke et al., 2019). No specific version numbers are provided for these software components.
Experiment Setup Yes The three agents use the same exact network architecture, two hidden layers with 100 units and relu activation, the Adam optimizer with a learning rate of 3e 4, a target updated frequency of 300, a replay buffer of 10k transitions and ε-greedy exploration with ε linearly decreasing from 1 to 0.01 in the first 1, 000 steps. WDQN uses Concrete Dropout in each hidden layer. Table 1 reports the hyperparameters used to train the agents. ... Table 2 shows the relevant hyperparameters. ... Table 3 reports a list of additional relevant hyperparameters.