reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DAMA: Data- and Model-aware Alignment of Multi-modal LLMs

Authors: Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, Yifan Zhang, Junfeng Fang, Xiang Wang, Xiangnan He

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on five benchmarks demonstrate that DAMA not only significantly enhances the trustworthiness, but also improves the effectiveness over general tasks. For instance, on the Object Hal Bench, our DAMA-7B reduces response-level and mentioned-level hallucination by 90.0% and 95.3%, respectively, surpassing the performance of GPT-4V.
Researcher Affiliation	Academia	1Mo E Key Lab of BIPC, University of Science and Technology of China 2Nanyang Technological University 3Institute of Automation, University of Chinese Academy of Sciences 4National University of Singapore. Correspondence to: Xiang Wang <EMAIL>, Xiangnan He <EMAIL>.
Pseudocode	Yes	Algorithm 1 Algorithm of DAMA. Input: Preference dataset D, hyper-parameter β, SFT model πSFT, CLIP classifier ΓCLIP. Output: The optimized model πθ. Initialize model πθ and reference model πref as πSFT. for {(I, x, yw, yl)} in D do Sw LLM{yw}, Sl LLM{yl}; obtains δ with {I, Sw}, {I, Sl}; Equ (3) (5); αD σ(δ)/σ( δ); Equ (6); end for repeat for B = {(Ii, xi, yw,i, yl,i)}N i=1 D do obtain Ri with yw,i and yl,i; Equ (8); obtain RB with Ri; Equ (9) (11); αM σ( RB)/σ( R); Equ (12); α αB D αM, where αB D = {αD,i}N i=1; Equ (15); βC β α; Equ (16); Compute loss w.r.t. βC, πθ; Equ (2); Compute the gradient and update the model πθ. R γ R + (1 γ) RB; Equ (14); end for until The optimization is converged.
Open Source Code	Yes	Code is available at: https://github.com/injadlu/DAMA.
Open Datasets	Yes	Dataset: Our focus is not on the preference data construction, thus we directly utilize the released dataset by (Yu et al., 2024c), which contains 22k preference data totally.
Dataset Splits	No	The paper states: "Dataset: Our focus is not on the preference data construction, thus we directly utilize the released dataset by (Yu et al., 2024c), which contains 22k preference data totally." and "For both LLa VA-1.5 7B and 13B models, we employ full parameter-tuning over the preference dataset with four epochs." While a dataset is used, the paper does not specify any train/test/validation splits for its own experiments, nor does it refer to predefined splits for the 22k preference data from the cited work.
Hardware Specification	Yes	All experiments are conducted with four A100 80GB GPUs, and four epochs of fine-tuning cost seven hours for both backbones.
Software Dependencies	No	The paper mentions "we adopt the same hyperparameters as provided in the official LLa VA Git Hub repository 1" but does not provide specific version numbers for any software components (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	Implementation Details. For both LLa VA-1.5 7B and 13B models, we employ full parameter-tuning over the preference dataset with four epochs. Specifically, for reproducibility, we adopt the same hyperparameters as provided in the official LLa VA Git Hub repository 1. The batch size N is set to 16, the selected size K is set to 12, and the penalty hyperparameter β is set to 0.1 by following (Rafailov et al., 2024; Yu et al., 2024c).