reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Episodic Convex Reinforcement Learning

Authors: Bianca Marin Moreno, Khaled Eldowa, Pierre Gaillard, Margaux Brégère, Nadia Oudjane

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Bonus O-MD-CURL on the multi-objective and constrained MDP tasks from (Geist et al., 2022), which use fixed objective functions and fixed probability kernels across time steps. ... These examples empirically demonstrate the value of the additive bonus in tasks requiring exploration.
Researcher Affiliation	Collaboration	1Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France. 2EDF Lab, 7 bd Gaspard Monge, 91120 Palaiseau, France 3Fi ME (Laboratoire de Finance des March es de l Energie Dauphine, CREST, EDF R&D) 4Universit a degli Studi di Milano, Milan, Italy 5Politecnico di Milano, Milan, Italy 6Sorbonne Universit e LPSM, Paris, France.
Pseudocode	Yes	Algorithm 1 Bonus O-MD-CURL (Full-information)
Open Source Code	Yes	1The code to reproduce the empirical results are available at: https://github.com/biancammoreno/Convex_RL
Open Datasets	Yes	We evaluate Bonus O-MD-CURL on the multi-objective and constrained MDP tasks from (Geist et al., 2022), which use fixed objective functions and fixed probability kernels across time steps.
Dataset Splits	No	The paper describes the environment setup (e.g., "11 x 11 four-room grid world") and task parameters ("N 40, τ 0.01, and 5 repetitions per experiment") but does not provide specific training/test/validation dataset splits typically found in supervised learning.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models) used for running the experiments. It only describes the simulation environment and experimental parameters.
Software Dependencies	No	The paper mentions that code is available on GitHub but does not specify any software libraries, frameworks, or their version numbers used in the implementation.
Experiment Setup	Yes	The state space is an 11 × 11 four-room grid world, with a single door connecting adjacent rooms. The agent can choose to stay still or move right, left, up, or down... The initial distribution is a Dirac delta at the upper left corner of the grid, as in Fig. 1 [left]. We take N 40, τ 0.01, and 5 repetitions per experiment. Multi-objectives: The goal is to concentrate the distribution on three targets by the final step N, as in Fig. 1 [middle]. The objective function is defined as fnpµπ,p n q : ř3 k 1p1 xµπ,p n , ekyq2... Constrained MDPs: The goal is to concentrate the state distribution on the yellow target in Fig. 1 [right] while avoiding the constraint states in blue. The objective function is defined as fnpµπ,p n q : xr, µπ,p n y pxµπ,p n , cyq2...