DAMA: Data- and Model-aware Alignment of Multi-modal LLMs
Authors: Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, Yifan Zhang, Junfeng Fang, Xiang Wang, Xiangnan He
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on five benchmarks demonstrate that DAMA not only significantly enhances the trustworthiness, but also improves the effectiveness over general tasks. For instance, on the Object Hal Bench, our DAMA-7B reduces response-level and mentioned-level hallucination by 90.0% and 95.3%, respectively, surpassing the performance of GPT-4V. |
| Researcher Affiliation | Academia | 1Mo E Key Lab of BIPC, University of Science and Technology of China 2Nanyang Technological University 3Institute of Automation, University of Chinese Academy of Sciences 4National University of Singapore. Correspondence to: Xiang Wang <EMAIL>, Xiangnan He <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Algorithm of DAMA. Input: Preference dataset D, hyper-parameter β, SFT model πSFT, CLIP classifier ΓCLIP. Output: The optimized model πθ. Initialize model πθ and reference model πref as πSFT. for {(I, x, yw, yl)} in D do Sw LLM{yw}, Sl LLM{yl}; obtains δ with {I, Sw}, {I, Sl}; Equ (3) (5); αD σ(δ)/σ( δ); Equ (6); end for repeat for B = {(Ii, xi, yw,i, yl,i)}N i=1 D do obtain Ri with yw,i and yl,i; Equ (8); obtain RB with Ri; Equ (9) (11); αM σ( RB)/σ( R); Equ (12); α αB D αM, where αB D = {αD,i}N i=1; Equ (15); βC β α; Equ (16); Compute loss w.r.t. βC, πθ; Equ (2); Compute the gradient and update the model πθ. R γ R + (1 γ) RB; Equ (14); end for until The optimization is converged. |
| Open Source Code | Yes | Code is available at: https://github.com/injadlu/DAMA. |
| Open Datasets | Yes | Dataset: Our focus is not on the preference data construction, thus we directly utilize the released dataset by (Yu et al., 2024c), which contains 22k preference data totally. |
| Dataset Splits | No | The paper states: "Dataset: Our focus is not on the preference data construction, thus we directly utilize the released dataset by (Yu et al., 2024c), which contains 22k preference data totally." and "For both LLa VA-1.5 7B and 13B models, we employ full parameter-tuning over the preference dataset with four epochs." While a dataset is used, the paper does not specify any train/test/validation splits for its own experiments, nor does it refer to predefined splits for the 22k preference data from the cited work. |
| Hardware Specification | Yes | All experiments are conducted with four A100 80GB GPUs, and four epochs of fine-tuning cost seven hours for both backbones. |
| Software Dependencies | No | The paper mentions "we adopt the same hyperparameters as provided in the official LLa VA Git Hub repository 1" but does not provide specific version numbers for any software components (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Implementation Details. For both LLa VA-1.5 7B and 13B models, we employ full parameter-tuning over the preference dataset with four epochs. Specifically, for reproducibility, we adopt the same hyperparameters as provided in the official LLa VA Git Hub repository 1. The batch size N is set to 16, the selected size K is set to 12, and the penalty hyperparameter β is set to 0.1 by following (Rafailov et al., 2024; Yu et al., 2024c). |