reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning

Authors: Daoming Zong, Chaoyue Ding, Kaitao Chen, Yinsheng Li, Shuaiyu Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness and generalizability of CF-PACR, demonstrating considerable improvements over traditional PACR models using counterfactual inference.
Researcher Affiliation	Collaboration	1Sense Time Research 2School of Computer Science, Fudan University, Shanghai, China
Pseudocode	No	The paper describes the CF-PACR framework conceptually and mathematically but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that source code for the methodology is released or provide a link to a code repository.
Open Datasets	Yes	PACS (Yu et al. 2022) is a video-based audiovisual benchmark designed to evaluate the model s ability to reason about physical commonsense using audio and visual modalities. Yu, S.; Wu, P.; Liang, P. P.; Salakhutdinov, R.; and Morency, L.-P. 2022. PACS: A dataset for physical audiovisual common Sense reasoning. ar Xiv preprint ar Xiv:2203.11130.
Dataset Splits	Yes	The training, validation, and test sets for PACS-QA consist of 11,044, 1,192, and 1,164 samples respectively. For PACS-Material, the training, validation, and test sets comprise 3,460, 444, and 445 samples respectively.
Hardware Specification	Yes	All variants were trained on four NVIDIA Tesla V100 GPUs with a batch size of 16, 30 epochs, a weight decay of 1e 4, and an initial learning rate of 1e 3.
Software Dependencies	No	The paper mentions several pre-trained models and frameworks used (e.g., CLIP, Audio CLIP, MERLOT Reserve, ViT, AST, TDN, DeBERTa-V3) but does not provide specific version numbers for general ancillary software like Python, PyTorch, or TensorFlow that would be needed to replicate the experiment.
Experiment Setup	Yes	All variants were trained on four NVIDIA Tesla V100 GPUs with a batch size of 16, 30 epochs, a weight decay of 1e 4, and an initial learning rate of 1e 3. For the CF-PACR framework, hyperparameters α, β, γ, and τ were tuned within [0, 1] at 0.1 intervals.