Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

Authors: Mohamad Fares El Hajj Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate SITAlign s performance through extensive experimentation on multiple benchmarks. For instance, on the PKU-Safe RLHF dataset with the primary objective of maximizing helpfulness while ensuring a threshold on harmlessness, SITAlign outperforms the state-of-the-art multi-objective decoding strategy by a margin of 22.3% in terms of GPT-4 win-tie rate for helpfulness reward while adhering to the threshold on harmlessness.
Researcher Affiliation Academia 1University of Texas at Austin, Austin, TX, USA 2University of Maryland, College Park, MD, USA 3University of Central Florida, Orlando, FL, USA.
Pseudocode Yes Algorithm 1 SITAlign: Satisficing Inference-Time Alignment of Large Language Models
Open Source Code No The paper states: "Reproducibility is ensured through the use of publicly available resources." However, it does not provide a specific link or explicit statement that *their* source code is open-sourced or available in supplementary materials.
Open Datasets Yes For this task, we utilize the PKU-Safe RLHF dataset (Ji et al., 2024). This task is based on the Anthropic-HH dataset (Bai et al., 2022b). For this task, we utilize the Summarize-from-Feedback dataset (Stiennon et al., 2020b).
Dataset Splits No The paper mentions using "the test set of PKU-Safe RLHF dataset" but does not specify the exact split percentages or sample counts for training, validation, or test sets for any of the datasets used.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using GPT-4 for evaluation and refers to models like Zephyr-7B-β, Skywork-Reward-Llama-3.1-8B-v0.2, MPT-7B-Chat, and Minotaur-7B. However, it does not specify version numbers for any ancillary software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For implementation, we set the number of tokens sampled (top-k) k = 10 and the decoding alignment parameter α = 1.