reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations

Authors: Longtian Qiu, Shan Ning, Chuyu Zhang, Jiaxuan Sun, Xuming He

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that DA-DPO consistently improves multimodal preference optimization, yielding stronger robustness to hallucinations and better generalization across standard benchmarks, while remaining computationally efficient. We conduct experiments on three popular MLLMs with different scales and abilities. To provide a comprehensive comparison, we report the performance comparison and analysis on two sets of benchmarks, hallucination benchmarks (Wang et al., 2023a; Rohrbach et al., 2018; Sun et al., 2023b; Li et al., 2023e) and general MLLM benchmarks (Hudson & Manning, 2019; Liu et al., 2023b; Fu et al., 2023; Li et al., 2023a), which demonstrate the effectiveness of our approach.
Researcher Affiliation	Academia	Longtian Qiu EMAIL Shanghai Tech University, Shanghai, China Shan Ning EMAIL Shanghai Tech University, Shanghai, China Lingang Laboratory, Shanghai, China Chuyu Zhang EMAIL Shanghai Tech University, Shanghai, China Jiaxuan Sun EMAIL Shanghai Tech University, Shanghai, China Xuming He EMAIL Shanghai Tech University, Shanghai, China Shanghai Engineering Research Center of Intelligent Vision and Imaging
Pseudocode	No	The paper describes methods and objectives using mathematical equations and descriptive text, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code	No	The project page is available at https://artanic30.github.io/project_pages/DA-DPO. This link points to a project overview page rather than a direct code repository as required by the criteria for 'Yes'.
Open Datasets	Yes	To validate the effectiveness of the proposed methods, we use the pair preference datasets from BPO (Pi et al., 2024). This dataset contains 180k pairwise preference data... Hallucination Evaluation. Following previous works (Pi et al., 2024; Wang et al., 2024a; Ouali et al., 2024), we comprehensively evaluate the DA-DPO on various hallucination benchmarks such as AMBER (Wang et al., 2023a), MMHal Bench (Sun et al., 2023b), Object Hal Bench (Rohrbach et al., 2018), and POPE (Li et al., 2023e).
Dataset Splits	Yes	We split the dataset into a training set and a held-out validation set with a ratio of 90% to 10%.
Hardware Specification	Yes	Wall-clock hours are estimated on an NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions models like LLa VA V1.5, Lo RA, CLIP, and EVA-CLIP but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For training parameters, we train the model for 1 epoch and set the β to 0.2. For LLa VA V1.5, we follow previous work (Pi et al., 2024) to adopt the Lo RA (Hu et al., 2021) training with rank 32 and Lo RA alpha 256. The learning rate is set to 2e-6. For LLa VA-One Vision, we use the recommended official training script to perform full finetuning where the learning rate is 5e-7.