ReFF: Reinforcing Format Faithfulness in Language Models Across Varied Tasks

Authors: Jiashu Yao, Heyan Huang, Zeming Liu, Haoyu Wen, Wei Su, Boao Qian, Yuhang Guo

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the benchmark reveal that state-of-the-art openand closed-source LLMs still suffer from severe deficiency in format faithfulness. By virtue of the decidable nature of formats, we propose to Reinforce Format Faithfulness (REFF) to help LLMs generate formatted output as instructed without compromising general quality. Extensive experiments of REFF on FORMATBENCH yield highly favorable results.
Researcher Affiliation Academia 1School of Computer Science and Technology, Beijing Institute of Technology 2School of Computer Science and Engineering, Beihang University EMAIL, EMAIL
Pseudocode Yes Algorithm 1: REFF Input: query set Q, format checker F, LLM M, # epoch n Output: adapted LLM M 1: Let M M 2: for epoch in [1, 2, ..., n] do 3: for q in Q do 4: r M (q) // response generation 5: s F(q, r) // format checking, s { 1, 1} 6: M step(M , q, r, s) // PPO stepping 7: end for 8: end for 9: return M
Open Source Code Yes Code & Datasets https://github.com/BITHLP/Re FF
Open Datasets Yes Code & Datasets https://github.com/BITHLP/Re FF. To address the gap in comprehensive benchmarks, we combine adaptation of existing datasets, online data collection, and manual data annotation, presenting FORMATBENCH.
Dataset Splits Yes Settings Test Queries Train Queries Train Labels REFF-tst " % % REFF-trn % " % REFF-trn-ft % " " Table 3: Data used for RL in three settings of REFF. Test-Only REFF When there exists no extra training data, LLMs can use queries in the test set as the query set Q. Notably, no label of the test set is available to the model in this setting. Train-Only REFF w./wo. Finetuning Train-only setting can be applied in an online scenario, where the queries are processed and responsed one by one, as the adaptation of LLMs only involves training queries as the query set Q. Additionally, considering that a training set often includes both queries and labels, we further study a train-only with finetuning setting, where the reinforcement process is implemented after finetuning on the training set.
Hardware Specification No No specific hardware details (like GPU/CPU models or types) were mentioned for running the experiments. The paper lists LLMs used but not the computing infrastructure.
Software Dependencies No The paper states: "We use trl (von Werra et al. 2020) library to implement the finetuning and the RLHF-style PPO of REFF." While a library is named, a specific version number for the 'trl' library is not provided, which is required for reproducibility.
Experiment Setup Yes Hyper-Parameters To ensure the robustness and reliability of the results, we try to use default and commonly-used hyper-parameters, and keep them consistent among different experiments. Here we list several key points, and the detailed hyper-parameters are outlined in Appendix D. In generation, we adopt greedy decoding in all experiments for a fair and efficient comparison. We use Lo RA (Hu et al. 2021) in all LLM adaptation experiments with a consistent configuration r = 16. In fintuning, we use a constant learning rate 2e 5 and train for 3 epochs with 256 instances per batch. In reinforcement learning, we set target of KL divergency to be 6, use a constant learning rate 1.41e 5, and train for 3 epochs with 32 instances per batch.