Efficient Self-Supervised Video Hashing with Selective State Spaces

Authors: Jinpeng Wang, Niu Lian, Jun Li, Yuting Wang, Yan Feng, Bin Chen, Yongbing Zhang, Shu-Tao Xia

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate S5VH s improvements over state-of-the-art methods, superior transferability, and scalable advantages in inference efficiency. ... We conduct extensive experiments on 4 datasets: Activity Net, FCVID, UCF101, and HMDB51, demonstrating that S5VH outperforms state-of-the-art baselines under various setups and transfers better across datasets. ... Additionally, we provide comprehensive ablations and analyses, focusing on network architecture and training strategy.
Researcher Affiliation Collaboration 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Harbin Institute of Technology, Shenzhen ... 4Meituan, Beijing
Pseudocode No The paper describes an optimization problem for hash center generation using equations and prose (e.g., 'Optimization for Hash Center Generation' section), but it does not present this as a structured pseudocode or algorithm block.
Open Source Code Yes Code https://github.com/gimpong/AAAI25-S5VH
Open Datasets Yes We conduct experiments on 4 benchmark datasets. (i) Activity Net (Caba Heilbron et al. 2015) ... (ii) FCVID (Jiang et al. 2017) ... (iii) UCF101 (Soomro, Zamir, and Shah 2012) ... (iv) HMDB51 (Kuehne et al. 2011)
Dataset Splits Yes (i) Activity Net ... using 9,722 videos for training. We uniformly sample 1,000 videos across 200 categories in the validation set as queries, and the remaining 3,758 videos as the database. ... (iii) UCF101 ... We use 9,537 videos for training and the database, and 3,783 videos from the test set as the query set. (iv) HMDB51 ... We use 3,570 videos for both training and database and 1,530 videos from the test set are designated as the query set.
Hardware Specification No We perform stress testing with them in the same computational environment, taking 5 samples as a unit to probe the maximally affordable batchsizes and measuring the average inference time per sample.
Software Dependencies No For the model training, we choose the Adam W optimizer with default parameters in Pytorch
Experiment Setup Yes For the model training, we choose the Adam W optimizer with default parameters in Pytorch, and employ a cosine annealed learning rate scheduling from 5e-4 to 1e-5. The models are trained for up to 350 epochs with 5-patience early-stopping to prevent overfitting. The default hyperparameter configurations are as below: (i) We set the mask ratio ρ = |M|/Nt to 0.75 on the FCVID dataset and 0.5 on the rest of the datasets. (ii) The temperature factor τ in Equations (20) and (21) is set 0.5. (iii) The number of semantic centers Nc is set to 450 on FCVID and 100 on the other datasets. ... we set the 6 layers for the encoder and 1 layer for the decoder. The latent dimensions of the encoder and decoder are set to 256 and 192, respectively.