reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark

Authors: Bing Cao, Quanhao Lu, Jiekang Feng, Qilong Wang, Pengfei Zhu, Qinghua Hu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three crowd datasets and our Drone Bird validate our superiority against the counterparts. The code and dataset are available 1.
Researcher Affiliation	Academia	Bing Cao, Quanhao Lu, Jiekang Feng, Qilong Wang, Qinghua Hu, Pengfei Zhu Tianjin University {caobing,luquanhao,fengjiekang,qlwang,huqinghua,zhupengfei} @tju.edu.cn
Pseudocode	Yes	Algorithm 1 Framework Workflow in Training Phase Algorithm 2 DEMO workflow in training phase
Open Source Code	Yes	The code and dataset are available 1. 1https://github.com/mast1ren/E-MAC
Open Datasets	Yes	We first propose a large video bird counting dataset, Drone Bird, in natural scenarios for migratory bird protection. Extensive experiments on three crowd datasets and our Drone Bird validate our superiority against the counterparts. The code and dataset are available 1. 1https://github.com/mast1ren/E-MAC We conduct experiments on our Drone Bird dataset and three video object counting datasets: Fudan-Shanghai Tech (FDST) (Fang et al., 2019), Mall (Loy et al., 2013) and VSCrowd (Li et al., 2022) datasets.
Dataset Splits	Yes	We cut the 40 videos in the train and test sets to 500 frames per video (around 17s), and cut the 10 videos in the validate set to 150 frames per video (around 5s) to accomplish a reasonable data division. The train set, test set and validate set after the division is completed contain 15, 000 frames, 5, 000 frames and 1, 500 frames, respectively. For the Mall dataset, we follow the previous works (Bai & Chan, 2021; Hossain et al., 2020) for a fair comparison. The model is trained with the first 800 frames of the Mall dataset, and the rest 1, 200 frames are used as the test set.
Hardware Specification	Yes	Our experiments are conducted on Huawei Atlas 800 Training Server with CANN and NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions 'Huawei Atlas 800 Training Server with CANN' and 'NVIDIA RTX 3090 GPU'. While CANN is a software stack, no specific version numbers for CANN or any other key software libraries/frameworks (like Python, PyTorch, CUDA, etc.) are provided.
Experiment Setup	Yes	For hyperparameter settings, the model employs a linear learning rate warm-up for the first 15 epochs, followed by a cosine decay learning rate. The weight decay of Adam W is set to 0.05, and layer decay is set to 0.75 for the encoder. The mask ratio is 0.72. ... The probability P for spatial adaptive masking is set to 0.2. The trade-off parameters λ1, λ2, λ3, λ4 are set to 10, 10, 1, and 20, respectively. The input images are set to the size of 448 640 and the batch size is set to 3. We construct the same architecture as that in comparison experiments and trained for 200 epochs.