reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Combining MORL with Restraining Bolts to Learn Normative Behaviour

Authors: Emery A. Neufeld, Agata Ciabattoni, Radu Florin Tulcan

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Case studies show that ONRBs offer a robust and principled foundation for RL-agents to comply with a wide range of norms while achieving their goals. Figure 1: Agent trajectories demonstrating (a) managing conflicting norms, (b) CTD obligations, and (c) norm change, after 5k training episodes (10k for (a)) The plots in (d) depict average rewards (over 50 episodes) for (top) R, (middle) r2Rφ2, and (bottom) r3Rφ3.
Researcher Affiliation	Academia	Emery A. Neufeld , Agata Ciabattoni and Radu Florin Tulcan TU Wien EMAIL, EMAIL, EMAIL
Pseudocode	No	For brevity, we omit the algorithm’s details, noting only that it computes the partial convex hull P(Π)...
Open Source Code	Yes	Implementation can be found at: https://github.com/lexeree/ ordered-normative-restraining-bolts
Open Datasets	Yes	We utilize the Travelling Merchant environment, first introduced in [Neufeld et al., 2022].
Dataset Splits	No	The paper uses a reinforcement learning environment rather than a static dataset that would typically require explicit splits. The text mentions '5k training episodes (10k for (a))' which refers to training duration, not dataset splitting.
Hardware Specification	Yes	The computations above were performed with an AMD Ryzen 7 5800H with Radeon Graphics (8 cores, 3.2 GHz) and 16 GB RAM, except the computation of wΦ1, which ran on one core of an AMD EPYC 9334 (2.7 GHz), taking 2.5 hrs and <256 GB RAM.
Software Dependencies	No	The paper discusses various algorithms and theoretical frameworks like Q-learning, MOMDP, CHVI, and linear programming but does not specify any concrete software libraries or their version numbers used for implementation.
Experiment Setup	Yes	We utilize the Travelling Merchant environment, first introduced in [Neufeld et al., 2022]. This environment is an elaborate adaptation of the resource-gathering game... It entails an RL agent, a merchant, traversing a map and collecting resources to sell at a market... The agent is rewarded based on how many resources it gathers... States are labelled with the cell type (e.g., at danger), whether or not the agent is being attacked (attack), whether or not the sun is down (sundown), and what it has in its inventory (e.g., has ore). Figure 1: Agent trajectories demonstrating ... after 5k training episodes (10k for (a))