reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PanoDiT: Panoramic Videos Generation with Diffusion Transformer

Authors: Muyang Zhang, Yuzhi Chen, Rongtao Xu, Changwei Wang, Jinming Yang, Weiliang Meng, Jianwei Guo, Huihuang Zhao, Xiaopeng Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to previous methods, our Pano Di T achieves state-of-the-art performance across various evaluation metrics and user study, with code is available in the supplementary material. The training configuration featured a resolution of 512 × 1024, a frame length of 144, a batch size of 2, a learning rate of 5 × 10−6, and a total of 100,000 training steps. We conducted comparative experiments using Animate Diff, 360DVD, and SVD, all of which were trained under identical conditions on WEB360 and our PHQ360 dataset to ensure a fair comparison. Quantitative Results. The quantitative results are given in Table 1, We report not only standard metrics for video evaluation, such as Fréchet Video Distance (Unterthiner et al. 2018) (FVD), but also Fréchet Inception Distance (Heusel et al. 2017) (FID) and Inception Score (IS) for individual frames of ERP videos.
Researcher Affiliation	Academia	1School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 2MAIS, Institute of Automation, Chinese Academy of Sciences, Beijing, China 3Qilu University of Technology (Shandong Academy of Sciences), Shandong, China 4School of Artificial Intelligence, Beijing Normal University, Beijing, China 5College of Computer Science and Technology, Hengyang Normal University, Hunan, China
Pseudocode	No	The paper describes methods in prose and mathematical equations but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Compared to previous methods, our Pano Di T achieves state-of-the-art performance across various evaluation metrics and user study, with code is available in the supplementary material.
Open Datasets	Yes	We construct a novel Panoramic High-Quality 360 (PHQ360) Dataset based on WEB360, which has been meticulously refined using aesthetic and motion scoring, along with Likert scale-based human evaluation. In previous work introducing text-to-panoramic video datasets, datasets like ODV360 (Cao et al. 2023) and WEB360 (Wang et al. 2024) were developed.
Dataset Splits	No	The training configuration featured a resolution of 512 × 1024, a frame length of 144, a batch size of 2, a learning rate of 5 × 10−6, and a total of 100,000 training steps. This specifies training parameters but no explicit training/test/validation splits or their percentages/counts are mentioned for PHQ360 or WEB360.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU models or CPU specifications.
Software Dependencies	No	The paper does not list specific software components with their version numbers, such as Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	The training configuration featured a resolution of 512 × 1024, a frame length of 144, a batch size of 2, a learning rate of 5 × 10−6, and a total of 100,000 training steps. We trained Pano Di T at three different scales: Small (S), Base (B), and Large (L) using our PHQ360 dataset.