ReFF: Reinforcing Format Faithfulness in Language Models Across Varied Tasks
Authors: Jiashu Yao, Heyan Huang, Zeming Liu, Haoyu Wen, Wei Su, Boao Qian, Yuhang Guo
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the benchmark reveal that state-of-the-art openand closed-source LLMs still suffer from severe deficiency in format faithfulness. By virtue of the decidable nature of formats, we propose to Reinforce Format Faithfulness (REFF) to help LLMs generate formatted output as instructed without compromising general quality. Extensive experiments of REFF on FORMATBENCH yield highly favorable results. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Beijing Institute of Technology 2School of Computer Science and Engineering, Beihang University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: REFF Input: query set Q, format checker F, LLM M, # epoch n Output: adapted LLM M 1: Let M M 2: for epoch in [1, 2, ..., n] do 3: for q in Q do 4: r M (q) // response generation 5: s F(q, r) // format checking, s { 1, 1} 6: M step(M , q, r, s) // PPO stepping 7: end for 8: end for 9: return M |
| Open Source Code | Yes | Code & Datasets https://github.com/BITHLP/Re FF |
| Open Datasets | Yes | Code & Datasets https://github.com/BITHLP/Re FF. To address the gap in comprehensive benchmarks, we combine adaptation of existing datasets, online data collection, and manual data annotation, presenting FORMATBENCH. |
| Dataset Splits | Yes | Settings Test Queries Train Queries Train Labels REFF-tst " % % REFF-trn % " % REFF-trn-ft % " " Table 3: Data used for RL in three settings of REFF. Test-Only REFF When there exists no extra training data, LLMs can use queries in the test set as the query set Q. Notably, no label of the test set is available to the model in this setting. Train-Only REFF w./wo. Finetuning Train-only setting can be applied in an online scenario, where the queries are processed and responsed one by one, as the adaptation of LLMs only involves training queries as the query set Q. Additionally, considering that a training set often includes both queries and labels, we further study a train-only with finetuning setting, where the reinforcement process is implemented after finetuning on the training set. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or types) were mentioned for running the experiments. The paper lists LLMs used but not the computing infrastructure. |
| Software Dependencies | No | The paper states: "We use trl (von Werra et al. 2020) library to implement the finetuning and the RLHF-style PPO of REFF." While a library is named, a specific version number for the 'trl' library is not provided, which is required for reproducibility. |
| Experiment Setup | Yes | Hyper-Parameters To ensure the robustness and reliability of the results, we try to use default and commonly-used hyper-parameters, and keep them consistent among different experiments. Here we list several key points, and the detailed hyper-parameters are outlined in Appendix D. In generation, we adopt greedy decoding in all experiments for a fair and efficient comparison. We use Lo RA (Hu et al. 2021) in all LLM adaptation experiments with a consistent configuration r = 16. In fintuning, we use a constant learning rate 2e 5 and train for 3 epochs with 256 instances per batch. In reinforcement learning, we set target of KL divergency to be 6, use a constant learning rate 1.41e 5, and train for 3 epochs with 32 instances per batch. |