Language Model Alignment in Multilingual Trolley Problems

Authors: Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, Sydney Levine, Jiarui Liu, Fernando Gonzalez Adauto, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, Bernhard Schölkopf

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the moral alignment of LLMs with human preferences in multilingual trolley problems. ... Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions... Our findings reveal that very few LLMs demonstrate overall alignment with human preferences.
Researcher Affiliation Academia 1Max Planck Institute for Intelligent Systems, Tübingen 2ETH Zürich 3University of Toronto 4University of Washington 5Allen Institute for AI (AI2) 6Carnegie Mellon University 7University of Trieste 8University of Michigan
Pseudocode No The paper describes the methodology and prompt construction in detail but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and data are at https://github.com/causal NLP/multi TP.
Open Datasets Yes We introduce our Multilingual Trolley Problems (MULTITP) dataset... Our code and data are at https://github.com/causal NLP/multi TP. ... The original Moral Machine dataset was collected by Awad et al. (2018) and is available here: https://osf.io/3hvt2/
Dataset Splits No The paper states the MULTITP dataset comprises 98,440 trolley problem vignettes and uses the Moral Machine dataset as ground truth, but it does not specify explicit training, validation, or test splits for models, as the paper evaluates pre-trained LLMs rather than training new models on split data.
Hardware Specification No The paper lists VRAM requirements for various LLM models being evaluated (Table 4) and mentions estimated API costs (Table 5), but it does not specify the exact hardware (e.g., GPU models, CPU types) used by the authors to run their experiments or evaluations.
Software Dependencies No The paper mentions using the googletrans Python package for translations but does not specify its version. It also lists the specific LLM models evaluated but does not provide version numbers for ancillary software or libraries used for the evaluation process itself.
Experiment Setup Yes For reproducibility, we fix the random seed and set the temperature to zero for the generation. ... We employ the token-forcing method (Wei et al., 2023; Carlini et al., 2023): Q: [Vignette Description] A: If the self-driving car has to make a decision, between the two choices, it should save...