Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning
Authors: Daoming Zong, Chaoyue Ding, Kaitao Chen, Yinsheng Li, Shuaiyu Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the effectiveness and generalizability of CF-PACR, demonstrating considerable improvements over traditional PACR models using counterfactual inference. |
| Researcher Affiliation | Collaboration | 1Sense Time Research 2School of Computer Science, Fudan University, Shanghai, China |
| Pseudocode | No | The paper describes the CF-PACR framework conceptually and mathematically but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that source code for the methodology is released or provide a link to a code repository. |
| Open Datasets | Yes | PACS (Yu et al. 2022) is a video-based audiovisual benchmark designed to evaluate the model s ability to reason about physical commonsense using audio and visual modalities. Yu, S.; Wu, P.; Liang, P. P.; Salakhutdinov, R.; and Morency, L.-P. 2022. PACS: A dataset for physical audiovisual common Sense reasoning. ar Xiv preprint ar Xiv:2203.11130. |
| Dataset Splits | Yes | The training, validation, and test sets for PACS-QA consist of 11,044, 1,192, and 1,164 samples respectively. For PACS-Material, the training, validation, and test sets comprise 3,460, 444, and 445 samples respectively. |
| Hardware Specification | Yes | All variants were trained on four NVIDIA Tesla V100 GPUs with a batch size of 16, 30 epochs, a weight decay of 1e 4, and an initial learning rate of 1e 3. |
| Software Dependencies | No | The paper mentions several pre-trained models and frameworks used (e.g., CLIP, Audio CLIP, MERLOT Reserve, ViT, AST, TDN, DeBERTa-V3) but does not provide specific version numbers for general ancillary software like Python, PyTorch, or TensorFlow that would be needed to replicate the experiment. |
| Experiment Setup | Yes | All variants were trained on four NVIDIA Tesla V100 GPUs with a batch size of 16, 30 epochs, a weight decay of 1e 4, and an initial learning rate of 1e 3. For the CF-PACR framework, hyperparameters α, β, γ, and τ were tuned within [0, 1] at 0.1 intervals. |