Combining MORL with Restraining Bolts to Learn Normative Behaviour
Authors: Emery A. Neufeld, Agata Ciabattoni, Radu Florin Tulcan
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Case studies show that ONRBs offer a robust and principled foundation for RL-agents to comply with a wide range of norms while achieving their goals. Figure 1: Agent trajectories demonstrating (a) managing conflicting norms, (b) CTD obligations, and (c) norm change, after 5k training episodes (10k for (a)) The plots in (d) depict average rewards (over 50 episodes) for (top) R, (middle) r2Rφ2, and (bottom) r3Rφ3. |
| Researcher Affiliation | Academia | Emery A. Neufeld , Agata Ciabattoni and Radu Florin Tulcan TU Wien EMAIL, EMAIL, EMAIL |
| Pseudocode | No | For brevity, we omit the algorithm’s details, noting only that it computes the partial convex hull P(Π)... |
| Open Source Code | Yes | Implementation can be found at: https://github.com/lexeree/ ordered-normative-restraining-bolts |
| Open Datasets | Yes | We utilize the Travelling Merchant environment, first introduced in [Neufeld et al., 2022]. |
| Dataset Splits | No | The paper uses a reinforcement learning environment rather than a static dataset that would typically require explicit splits. The text mentions '5k training episodes (10k for (a))' which refers to training duration, not dataset splitting. |
| Hardware Specification | Yes | The computations above were performed with an AMD Ryzen 7 5800H with Radeon Graphics (8 cores, 3.2 GHz) and 16 GB RAM, except the computation of wΦ1, which ran on one core of an AMD EPYC 9334 (2.7 GHz), taking 2.5 hrs and <256 GB RAM. |
| Software Dependencies | No | The paper discusses various algorithms and theoretical frameworks like Q-learning, MOMDP, CHVI, and linear programming but does not specify any concrete software libraries or their version numbers used for implementation. |
| Experiment Setup | Yes | We utilize the Travelling Merchant environment, first introduced in [Neufeld et al., 2022]. This environment is an elaborate adaptation of the resource-gathering game... It entails an RL agent, a merchant, traversing a map and collecting resources to sell at a market... The agent is rewarded based on how many resources it gathers... States are labelled with the cell type (e.g., at danger), whether or not the agent is being attacked (attack), whether or not the sun is down (sundown), and what it has in its inventory (e.g., has ore). Figure 1: Agent trajectories demonstrating ... after 5k training episodes (10k for (a)) |