reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

Authors: Weitao Wang, Haoran Xu, Yuxiao Yang, Zhifang Liu, Jun Meng, Haoqian Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that MVReward can serve as a reliable metric and MVP consistently enhances the alignment of multi-view diffusion models with human preferences. We conduct a user study to evaluate MVReward s ability in predicting human preferences. We perform ablation studies on the encoder backbone, multi-view self-attention, and negative samples to assess their effects on MVReward.
Researcher Affiliation	Academia	1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Zhejiang University
Pseudocode	Yes	Algorithm 1: Multi-View Preference Learning (MVP) for Multi-View DMs
Open Source Code	Yes	Code https://github.com/victor-thu/MVReward
Open Datasets	Yes	We begin by generating and filtering a standardized image prompt set from DALL E (Ramesh et al. 2021) and Objaverse (Deitke et al. 2023), ensuring the object(s) in each image are fully visible with well-designed geometry and texture. Furthermore, taking the widely-used GSO dataset (Downs et al. 2022) as an example
Dataset Splits	Yes	The training, validation and test datasets are split according to an 8:1:1 ratio.
Hardware Specification	Yes	Optimal performance is achieved with a batch size of 96 in total, an initial learning rate of 4e-5 using cosine annealing, on 4 NVIDIA Quadro RTX 8000. Both models are fine-tuned in half-precision on 8 NVIDIA Quadro RTX 8000, with a batch size of 128 in total and a learning rate of 5e-6 with warm-up.
Software Dependencies	No	The paper mentions BLIP and VIT-B as pre-trained models but does not specify version numbers for any software libraries or dependencies used in their implementation.
Experiment Setup	Yes	Optimal performance is achieved with a batch size of 96 in total, an initial learning rate of 4e-5 using cosine annealing, on 4 NVIDIA Quadro RTX 8000. Both models are fine-tuned in half-precision on 8 NVIDIA Quadro RTX 8000, with a batch size of 128 in total and a learning rate of 5e-6 with warm-up. The model parameters are fixed except for the designated trainable modules within the UNet.