Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

PNeRV: A Polynomial Neural Representation for Videos

Authors: Sonam Gupta, Snehal Singh Tomar, Grigorios Chrysos, Sukhendu Das, Rajagopalan N Ambasamduram

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We split our experimental analysis of PNe RV into (1) evaluation of the representation ability using Video Reconstruction task (2) testing the efficacy on the proposed downstream tasks (3) performing appropriate ablation studies to assess the contributions and salience of individual design elements. The downstream tasks we perform include (i) Video Compression to assess the applicability of PNe RV as an alternate lightweight video representation (ii) Video Super-Resolution to assess the spatial continuity of PNe RV (iii) Video Interpolation to assess the temporal continuity of PNe RV (iv) Video Denoising as an interesting application of PNe RV. We also compare the rate of convergence (during training) of PNe RV vis-à-vis prior art.
Researcher Affiliation Academia Sonam Gupta EMAIL Department of Computer Science & Engineering, IIT Madras Snehal Singh Tomar EMAIL Department of Electrical Engineering, IIT Madras Grigorios G Chrysos EMAIL University of Wisconsin-Madison Sukhendu Das EMAIL Department of Computer Science & Engineering, IIT Madras A. N. Rajagopalan EMAIL Department of Electrical Engineering, IIT Madras
Pseudocode Yes A.3 The Hierarchical Patch-wise Spatial Sampling Algorithm Algorithm 1 Hierarchical Patch-wise Spatial Sampling Input: C, H, D, K, L, M, N, Pij, Pkl Output: λij, Λij
Open Source Code No The paper does not contain an unambiguous statement about releasing code or a direct link to a code repository for the methodology described.
Open Datasets Yes We train and evaluate our model on the widely used UVG dataset (Mercat et al., 2020) and the "Big Buck Bunny" video sequence from scikit-video. The UVG dataset comprises 7 videos.
Dataset Splits Yes Following E-Ne RV s setup, we divide the training sequence in a 3:1 ("seen:unseen") ratio such that for every four consecutive frames, the fourth frame is not used training. This "unseen" frame is interpolated during inference to quantitatively evaluate the model s performance.
Hardware Specification Yes Table 12 (b) provides a quantitative comparison with state-of-the-art in terms of time taken (ms) to perform one forward pass of the model on NVIDIA Ge FORCE RTX 3090 GPU.
Software Dependencies No The paper mentions the use of "Adam optimizer (Kingma & Ba, 2014)" and a "cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016)", but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes For all our experiments, we train each model for 300 epochs with a batch size 16 (unless specified otherwise) with up-scale factors set to 5, 2, 2. The input embeddings ΓF P E, ΓT SE, and ΓP P E are computed with ν = 1.25. We set l = 80 for ΓF P E and ΓT SE. Whereas, ΓT SE uses α = 40. The network is trained using Adam optimizer (Kingma & Ba, 2014) with default hyperparameters, a learning rate of 5e 4, and a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016).