Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time
Authors: Mohamad Fares El Hajj Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate SITAlign s performance through extensive experimentation on multiple benchmarks. For instance, on the PKU-Safe RLHF dataset with the primary objective of maximizing helpfulness while ensuring a threshold on harmlessness, SITAlign outperforms the state-of-the-art multi-objective decoding strategy by a margin of 22.3% in terms of GPT-4 win-tie rate for helpfulness reward while adhering to the threshold on harmlessness. |
| Researcher Affiliation | Academia | 1University of Texas at Austin, Austin, TX, USA 2University of Maryland, College Park, MD, USA 3University of Central Florida, Orlando, FL, USA. |
| Pseudocode | Yes | Algorithm 1 SITAlign: Satisficing Inference-Time Alignment of Large Language Models |
| Open Source Code | No | The paper states: "Reproducibility is ensured through the use of publicly available resources." However, it does not provide a specific link or explicit statement that *their* source code is open-sourced or available in supplementary materials. |
| Open Datasets | Yes | For this task, we utilize the PKU-Safe RLHF dataset (Ji et al., 2024). This task is based on the Anthropic-HH dataset (Bai et al., 2022b). For this task, we utilize the Summarize-from-Feedback dataset (Stiennon et al., 2020b). |
| Dataset Splits | No | The paper mentions using "the test set of PKU-Safe RLHF dataset" but does not specify the exact split percentages or sample counts for training, validation, or test sets for any of the datasets used. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using GPT-4 for evaluation and refers to models like Zephyr-7B-β, Skywork-Reward-Llama-3.1-8B-v0.2, MPT-7B-Chat, and Minotaur-7B. However, it does not specify version numbers for any ancillary software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For implementation, we set the number of tokens sampled (top-k) k = 10 and the decoding alignment parameter α = 1. |