Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Gaussian Approximation for Bias Reduction in Q-Learning
Authors: Carlo D'Eramo, Andrea Cini, Alessandro Nuara, Matteo Pirotta, Cesare Alippi, Jan Peters, Marcello Restelli
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically evaluate our algorithms in a large set of heterogeneous problems, encompassing discrete and continuous, low and high dimensional, deterministic and stochastic environments. Experimental results show the effectiveness of the Weighted Estimator in controlling the bias of the estimate, resulting in better performance than representative baselines and robust learning w.r.t. a large set of diverse environments. Section 6 is dedicated to the extensive evaluation of our RL methodologies, where we show and discuss the empirical evidence of the benefit of WE w.r.t. ME, DE, and MME. |
| Researcher Affiliation | Collaboration | 1TU Darmstadt, Darmstadt, Germany 2The Swiss AI Lab IDSIA, Lugano, Switzerland 3Universit a della Svizzera italiana, Lugano, Switzerland 4Politecnico di Milano, Milano, Italy 5Facebook AI Research |
| Pseudocode | Yes | Algorithm 1 Weighted Q-learning, Algorithm 2 Weighted FQI (finite actions), Algorithm 3 Weighted FQI (continuous actions), Algorithm 4 Weighted DQN |
| Open Source Code | Yes | The algorithms and the experimental setup have been developed using the open-source RL libraries Mushroom RL5 (D Eramo et al., 2021) and Open AI Gym (Brockman et al., 2016). Full description of the complete experimental setup and further results can be found in the appendix. ... 5. Open source code at https://github.com/Mushroom RL/mushroom-rl. |
| Open Datasets | Yes | First we run a proof of concept experiment on the Lunar Lander environment (Brockman et al., 2016). Then we test WDQN in three environments of the Minatar benchmark (Young and Tian, 2019). ... Finally we perform a set of experiments on two Atari games from the Arcade Learning Environment (ALE) (Bellemare et al., 2013)... |
| Dataset Splits | No | The paper describes reinforcement learning experiments where data is collected through interaction with environments (Lunar Lander, Minatar, Atari games). It details training protocols, such as episode length limits and evaluation frequencies, but does not specify static training/validation/test dataset splits in terms of percentages, sample counts, or predefined partitioned files, as is typical for supervised learning tasks. Data is generated dynamically by the agent's interaction with the environment and stored in a replay memory. |
| Hardware Specification | Yes | The experiments were run on a server with an Intel Xeon Silver 4116 CPU with NVIDIA Titan V GPUs. |
| Software Dependencies | No | For the implementation of the algorithms and the simulation environments we rely on the following open-source libraries: numpy (Harris et al., 2020); Mushroom RL (D Eramo et al., 2021); Gym (Brockman et al., 2016); ALE (Bellemare et al., 2013); Min Atar (Young and Tian, 2019); Py Torch (Paszke et al., 2019). No specific version numbers are provided for these software components. |
| Experiment Setup | Yes | The three agents use the same exact network architecture, two hidden layers with 100 units and relu activation, the Adam optimizer with a learning rate of 3e 4, a target updated frequency of 300, a replay buffer of 10k transitions and ε-greedy exploration with ε linearly decreasing from 1 to 0.01 in the first 1, 000 steps. WDQN uses Concrete Dropout in each hidden layer. Table 1 reports the hyperparameters used to train the agents. ... Table 2 shows the relevant hyperparameters. ... Table 3 reports a list of additional relevant hyperparameters. |