DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations
Authors: Longtian Qiu, Shan Ning, Chuyu Zhang, Jiaxuan Sun, Xuming He
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that DA-DPO consistently improves multimodal preference optimization, yielding stronger robustness to hallucinations and better generalization across standard benchmarks, while remaining computationally efficient. We conduct experiments on three popular MLLMs with different scales and abilities. To provide a comprehensive comparison, we report the performance comparison and analysis on two sets of benchmarks, hallucination benchmarks (Wang et al., 2023a; Rohrbach et al., 2018; Sun et al., 2023b; Li et al., 2023e) and general MLLM benchmarks (Hudson & Manning, 2019; Liu et al., 2023b; Fu et al., 2023; Li et al., 2023a), which demonstrate the effectiveness of our approach. |
| Researcher Affiliation | Academia | Longtian Qiu EMAIL Shanghai Tech University, Shanghai, China Shan Ning EMAIL Shanghai Tech University, Shanghai, China Lingang Laboratory, Shanghai, China Chuyu Zhang EMAIL Shanghai Tech University, Shanghai, China Jiaxuan Sun EMAIL Shanghai Tech University, Shanghai, China Xuming He EMAIL Shanghai Tech University, Shanghai, China Shanghai Engineering Research Center of Intelligent Vision and Imaging |
| Pseudocode | No | The paper describes methods and objectives using mathematical equations and descriptive text, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code. |
| Open Source Code | No | The project page is available at https://artanic30.github.io/project_pages/DA-DPO. This link points to a project overview page rather than a direct code repository as required by the criteria for 'Yes'. |
| Open Datasets | Yes | To validate the effectiveness of the proposed methods, we use the pair preference datasets from BPO (Pi et al., 2024). This dataset contains 180k pairwise preference data... Hallucination Evaluation. Following previous works (Pi et al., 2024; Wang et al., 2024a; Ouali et al., 2024), we comprehensively evaluate the DA-DPO on various hallucination benchmarks such as AMBER (Wang et al., 2023a), MMHal Bench (Sun et al., 2023b), Object Hal Bench (Rohrbach et al., 2018), and POPE (Li et al., 2023e). |
| Dataset Splits | Yes | We split the dataset into a training set and a held-out validation set with a ratio of 90% to 10%. |
| Hardware Specification | Yes | Wall-clock hours are estimated on an NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions models like LLa VA V1.5, Lo RA, CLIP, and EVA-CLIP but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For training parameters, we train the model for 1 epoch and set the β to 0.2. For LLa VA V1.5, we follow previous work (Pi et al., 2024) to adopt the Lo RA (Hu et al., 2021) training with rank 32 and Lo RA alpha 256. The learning rate is set to 2e-6. For LLa VA-One Vision, we use the recommended official training script to perform full finetuning where the learning rate is 5e-7. |