Combining MORL with Restraining Bolts to Learn Normative Behaviour

Authors: Emery A. Neufeld, Agata Ciabattoni, Radu Florin Tulcan

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Case studies show that ONRBs offer a robust and principled foundation for RL-agents to comply with a wide range of norms while achieving their goals. Figure 1: Agent trajectories demonstrating (a) managing conflicting norms, (b) CTD obligations, and (c) norm change, after 5k training episodes (10k for (a)) The plots in (d) depict average rewards (over 50 episodes) for (top) R, (middle) r2Rφ2, and (bottom) r3Rφ3.
Researcher Affiliation Academia Emery A. Neufeld , Agata Ciabattoni and Radu Florin Tulcan TU Wien EMAIL, EMAIL, EMAIL
Pseudocode No For brevity, we omit the algorithm’s details, noting only that it computes the partial convex hull P(Π)...
Open Source Code Yes Implementation can be found at: https://github.com/lexeree/ ordered-normative-restraining-bolts
Open Datasets Yes We utilize the Travelling Merchant environment, first introduced in [Neufeld et al., 2022].
Dataset Splits No The paper uses a reinforcement learning environment rather than a static dataset that would typically require explicit splits. The text mentions '5k training episodes (10k for (a))' which refers to training duration, not dataset splitting.
Hardware Specification Yes The computations above were performed with an AMD Ryzen 7 5800H with Radeon Graphics (8 cores, 3.2 GHz) and 16 GB RAM, except the computation of wΦ1, which ran on one core of an AMD EPYC 9334 (2.7 GHz), taking 2.5 hrs and <256 GB RAM.
Software Dependencies No The paper discusses various algorithms and theoretical frameworks like Q-learning, MOMDP, CHVI, and linear programming but does not specify any concrete software libraries or their version numbers used for implementation.
Experiment Setup Yes We utilize the Travelling Merchant environment, first introduced in [Neufeld et al., 2022]. This environment is an elaborate adaptation of the resource-gathering game... It entails an RL agent, a merchant, traversing a map and collecting resources to sell at a market... The agent is rewarded based on how many resources it gathers... States are labelled with the cell type (e.g., at danger), whether or not the agent is being attacked (attack), whether or not the sun is down (sundown), and what it has in its inventory (e.g., has ore). Figure 1: Agent trajectories demonstrating ... after 5k training episodes (10k for (a))