Hierarchical Vector Quantization for Unsupervised Action Segmentation

Authors: Federico Spurio, Emad Bahrami, Gianpiero Francesca, Juergen Gall

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on three public datasets, namely Breakfast, You Tube Instructional and IKEA ASM. Our approach outperforms the state of the art in terms of F1 score, recall and JSD. ... We provide additional implementation details in the supp. material, which also includes additional ablation studies on the impact of the hyperparameters.
Researcher Affiliation Collaboration 1University of Bonn, Germany 2Toyota Motor Europe, Belgium 3Lamarr Institute for Machine Learning and Artificial Intelligence, Germany
Pseudocode No The paper describes the method's steps within the main text and equations (1-9) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Fede Spu/HVQ
Open Datasets Yes We evaluate the proposed method on three public datasets: Breakfast (Kuehne, Arslan, and Serre 2014), You Tube Instructional (Alayrac et al. 2016) and IKEA ASM (Ben Shabat et al. 2021).
Dataset Splits No Following the protocol that has been introduced by Kukleva et al. (2019), we apply our approach to all videos of each activity separately. K is set to the max number of subactions that appear for each activity. This is required for a fair comparison of the methods. We establish the mapping between predicted cluster segments and ground-truth segments via Hungarian matching, which is computed over all videos and their frames of one activity.
Hardware Specification Yes We measured the runtime on the same workstation with Intel i9-13900k CPU with 24 cores and one NVIDIA RTX 3090 GPU.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The overall loss for training our model is then the combination of the commitment loss for the vector quantization and the reconstruction loss of the auto-encoder: L = Lcommitz + Lcommitq + λrec Lrec (7) where λrec weights the two loss terms. ... We also study the impact of the weight parameter λrec for the loss term (7) and we report the results in Tab. 5. Using λrec = 0.002 performs well and we use it for all other experiments. ... In Tab. 6, we show the results on the Breakfast dataset considering different numbers of prototypes in Z, which is steered by different values of α.