DAMA: Data- and Model-aware Alignment of Multi-modal LLMs

Authors: Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, Yifan Zhang, Junfeng Fang, Xiang Wang, Xiangnan He

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five benchmarks demonstrate that DAMA not only significantly enhances the trustworthiness, but also improves the effectiveness over general tasks. For instance, on the Object Hal Bench, our DAMA-7B reduces response-level and mentioned-level hallucination by 90.0% and 95.3%, respectively, surpassing the performance of GPT-4V.
Researcher Affiliation Academia 1Mo E Key Lab of BIPC, University of Science and Technology of China 2Nanyang Technological University 3Institute of Automation, University of Chinese Academy of Sciences 4National University of Singapore. Correspondence to: Xiang Wang <EMAIL>, Xiangnan He <EMAIL>.
Pseudocode Yes Algorithm 1 Algorithm of DAMA. Input: Preference dataset D, hyper-parameter β, SFT model πSFT, CLIP classifier ΓCLIP. Output: The optimized model πθ. Initialize model πθ and reference model πref as πSFT. for {(I, x, yw, yl)} in D do Sw LLM{yw}, Sl LLM{yl}; obtains δ with {I, Sw}, {I, Sl}; Equ (3) (5); αD σ(δ)/σ( δ); Equ (6); end for repeat for B = {(Ii, xi, yw,i, yl,i)}N i=1 D do obtain Ri with yw,i and yl,i; Equ (8); obtain RB with Ri; Equ (9) (11); αM σ( RB)/σ( R); Equ (12); α αB D αM, where αB D = {αD,i}N i=1; Equ (15); βC β α; Equ (16); Compute loss w.r.t. βC, πθ; Equ (2); Compute the gradient and update the model πθ. R γ R + (1 γ) RB; Equ (14); end for until The optimization is converged.
Open Source Code Yes Code is available at: https://github.com/injadlu/DAMA.
Open Datasets Yes Dataset: Our focus is not on the preference data construction, thus we directly utilize the released dataset by (Yu et al., 2024c), which contains 22k preference data totally.
Dataset Splits No The paper states: "Dataset: Our focus is not on the preference data construction, thus we directly utilize the released dataset by (Yu et al., 2024c), which contains 22k preference data totally." and "For both LLa VA-1.5 7B and 13B models, we employ full parameter-tuning over the preference dataset with four epochs." While a dataset is used, the paper does not specify any train/test/validation splits for its own experiments, nor does it refer to predefined splits for the 22k preference data from the cited work.
Hardware Specification Yes All experiments are conducted with four A100 80GB GPUs, and four epochs of fine-tuning cost seven hours for both backbones.
Software Dependencies No The paper mentions "we adopt the same hyperparameters as provided in the official LLa VA Git Hub repository 1" but does not provide specific version numbers for any software components (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes Implementation Details. For both LLa VA-1.5 7B and 13B models, we employ full parameter-tuning over the preference dataset with four epochs. Specifically, for reproducibility, we adopt the same hyperparameters as provided in the official LLa VA Git Hub repository 1. The batch size N is set to 16, the selected size K is set to 12, and the penalty hyperparameter β is set to 0.1 by following (Rafailov et al., 2024; Yu et al., 2024c).