reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Role of Video Generation in Enhancing Data-Limited Action Understanding

Authors: Wei Li, Dezhao Luo, Dongbao Yang, Zhenhang Li, Weiping Wang, Yu Zhou

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through quantitative and qualitative analysis, we observed that real samples generally contain a richer level of information than generated samples. We conducted extensive experiments on four datasets across five tasks and achieve state-of-the-art performance for zero-shot action recognition. 4 Experiment 4.1 Implementation Details 4.2 Main Results Zero-shot Action Recognition ... 4.3 Ablation Studies
Researcher Affiliation	Academia	1Institute of Information Engineering, Chinese Academy of Sciences 2 VCIP & TMCC & DISSec, College of Computer Science, Nankai University 3 School of Cyber Security, University of Chinese Academy of Sciences 4Queen Mary University of London
Pseudocode	No	The paper describes the methods in text and uses flowcharts/diagrams in Figure 2 to illustrate the process, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets	Yes	We conducted extensive experiments on four datasets (Kinetics-600, UCF-101, HMDB-51, UCF-Crime)
Dataset Splits	Yes	We conducted the few-shot action recognition experiments with the UCF-101 and HMDB-51 datasets in Table 2. We first pre-train the model with generated samples and then fine-tune it on each dataset with only K samples per category, where K is in 2, 4, 8 and 16. In each dataset, 16 samples per category are selected from half of the classes to construct the base split for training, while the remaining half of the categories serve as the novel split for evaluation.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions specific models and APIs (Cog Video X-2B, GPT-4o, TC-CLIP, X-CLIP-B/32) but does not provide specific version numbers for underlying software libraries, frameworks (e.g., Python, PyTorch, TensorFlow, CUDA) or other ancillary software dependencies.
Experiment Setup	Yes	For each dataset, we generate 128 videos for each category with 50 inference steps. We set the w to 0.3 in uncertainty-based label smoothing. For tasks where real samples are not available such as zero-shot, we train the model with synthetic samples only. For tasks where real samples are available such as few-shot, long-tail, etc., we pre-train the model with synthetic samples and then fine-tune with the real samples.