Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Towards Fair Video Summarization
Authors: Anshuman Chhabra, Kartik Patwari, Chandana Kuntala, Sristi, Deepak Kumar Sharma, Prasant Mohapatra
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to benchmark the fairness of various state-of-the-art video summarization models. Our results highlight the need for better models that balance accuracy and fairness to ensure equitable representation and inclusion in summaries. For completeness, we also provide a novel fair-only baseline, FVS-LP, to showcase the fairness-utility gap models can improve upon. |
| Researcher Affiliation | Academia | Anshuman Chhabra EMAIL University of California, Davis Kartik Patwari EMAIL University of California, Davis Chandana Kuntala EMAIL Indira Gandhi Delhi Technical University for Women Sristi EMAIL Indira Gandhi Delhi Technical University for Women Deepak Kumar Sharma EMAIL Indira Gandhi Delhi Technical University for Women Prasant Mohapatra EMAIL University of South Florida, Tampa |
| Pseudocode | No | We now present a baseline for fair video summarization the Fair Video Summarization Linear Program (FVS-LP) which is a simple linear program (Schrijver, 1998) approach that only optimizes for fairness and selects frames such that the group proportions in the selected summary are as close as possible to the group proportions of the overall video. Note that this contribution is analogous to having a constant predictor in fair classification (Mehrabi et al., 2021) as it predicts a constant it will always achieve maximum fairness, but low utility/accuracy. However, in fair video summarization, even such a simple fair-only baseline is not as conceptually straightforward as a constant/random predictor, and hence, we propose FVS-LP to showcase the fairness-utility tradeoff gap. Let 0m and 1m denote an m length vector of all zeros and all ones, respectively. We have a given video V and its set of frames X, along with the set of group memberships H. First, we transform H to matrix form for formulating the LP. Let G {0, 1}n g be derived from H such that each row vector Gi {0, 1}g, i [n] represents a frame and each of its entries are either 0 for absence or 1 for presence of a group in the frame. Let 0 x 1 be the optimization variable where each entry of x Rn indicates if a frame is selected in the summary, then the LP can be written as Equation 3: minimize 0 n x s.t. G x = k 1. |
| Open Source Code | Yes | The Github repository contains all the code needed for reproducing experiments, and also hosts the Fair Vid Sum dataset. It is located here: https://github.com/anshuman23/fair_video_summarization_tmlr |
| Open Datasets | Yes | To this end, we propose the Fair Vid Sum dataset containing multiple individuals spanning diverse settings such as interviews, podcasts, and panel discussions. Unlike the other datasets, we provide manual annotations for sensitive attributes (fairness) as well as frame importance scores (utility). We introduce the Fair Vid Sum benchmark dataset, designed similarly to existing SOTA video summarization benchmarks TVSum (Song et al., 2015) and Sum Me (Gygli et al., 2014), which contains annotated individual and group-level fairness information. The Github repository contains all the code needed for reproducing experiments, and also hosts the Fair Vid Sum dataset. It is located here: https://github.com/anshuman23/fair_video_summarization_tmlr |
| Dataset Splits | Yes | We follow the standard evaluation procedure in existing video summarization literature, which involves randomly splitting the entire dataset into multiple parts or splits, typically 5, each split subjected to an 80:20 train/test partitioning (Apostolidis et al., 2022; 2020a;b; Zhu et al., 2020; Fajtl et al., 2019; Apostolidis et al., 2021b; Kanafani et al., 2021). The models are trained on the training set of a given split and subsequently evaluated on the corresponding test set within the same split. A detailed breakdown of the video distribution for all 5 train/test splits are present in the Appendix B.1 |
| Hardware Specification | Yes | The experiments were conducted on Ubuntu 20.04 using NVIDIA Ge Force RTX 3070 GPUs (CUDA version 11.1). |
| Software Dependencies | Yes | We use Python 3.8.16 and Anaconda to install all required libraries to run all models. The Anaconda environment yaml file is provided in our repository. All training codes utilized in our experiments were directly obtained from the official github repositories of the respective models, all implemented in Py Torch (v1.12.1). |
| Experiment Setup | Yes | When training the various models, we adhere to their original procedures, and generally employ default settings and hyperparameters. Any alterations or adjustments to the default training parameters are detailed in Appendix B. All adjustments made to parameters to train models on Fair Vid Sum: AC-SUM-GAN: regularization_factor = 5.0, clip = 1.0, action_state_size = 8 CA-SUM: block_size = 60, init_gain = 1.0, n_epochs = 200, clip = 1.0, lr = 1e-4, l2_req = 1e-6, reg_factor = 5.0 PGL-SUM: clip = 1.0, lr = 1e-4, l2_req = 1e-4 SUM-GAN-AAE: clip = 1.0, hidden_size = 512, regularization_factor = 5.0, lr = 1e-5 SUM-GAN-SL: clip = 1.0, hidden_size = 512, regularization_factor = 5.0 |