reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Successor Feature Representations

Authors: Chris Reinke, Xavier Alameda-Pineda

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally compare SFRQL in tasks with linear and general reward functions, and for tasks with discrete and continuous features to standard Q-learning and the classical SF framework, demonstrating the interest and advantage of SFRQL. We evaluated SFRQL in two environments. The first has discrete features. ... The second environment, the racer environment, evaluates the agents in tasks with continuous features.
Researcher Affiliation	Academia	Chris Reinke EMAIL Robot Learn INRIA Grenoble, LJK, UGA; Xavier Alameda-Pineda EMAIL Robot Learn INRIA Grenoble, LJK, UGA
Pseudocode	Yes	Algorithm 1: Q-learning (QL); Algorithm 2: Classical SF Q-learning (SF) (Barreto et al., 2017); Algorithm 3: Model-free SFRQL (SFR); Algorithm 4: One Step SF-Model SFRQL (MB SFR); Algorithm 5: Model-free SFRQL for Continuous Features (CSFR)
Open Source Code	Yes	Source code at https://gitlab.inria.fr/robotlearn/sfr_learning
Open Datasets	No	The environment consist of 4 connected rooms (Fig. 1, a). The agent starts an episode in position S and has to learn to reach goal position G. During an episode, the agent collects objects to gain further rewards. ... We further evaluated the agents in a 2D environment with continuous features (Fig. 2, a).
Dataset Splits	No	Each task was executed for 20, 000 steps, and the average performance over 10 runs per algorithm was measured. Each agent was evaluated on 40 tasks. The agents experienced the tasks sequentially, each for 1000 episodes (200, 000 steps per task).
Hardware Specification	Yes	Experiments were conducted on a cluster with a variety of node types (Xeon SKL Gold 6130 with 2.10GHz, Xeon SKL Gold 5218 with 2.30GHz, Xeon SKL Gold 6126 with 2.60GHz, Xeon SKL Gold 6244 with 3.60GHz, each with 192 GB Ram, no GPU).
Software Dependencies	No	We used Py Torch for the computation of gradients and its stochastic gradient decent procedure (SGD) for updating the parameters. The parameters H and wi 1,...20 were optimized with Adam (learning rate of 0.003).
Experiment Setup	Yes	Each task was executed for 20, 000 steps, and the average performance over 10 runs per algorithm was measured. ... The probability for random actions of the ϵ-Greedy action selection was set to ϵ = 0.15 and the discount rate to γ = 0.95. The initial weights θ for the function approximators were randomly sampled from a standard distribution with θinit N(µ = 0, σ = 0.01). ... A grid search over the learning rates of all algorithms was performed. Each learning rate was evaluated for three different settings which are listed in Table 1.