Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
PNeRV: A Polynomial Neural Representation for Videos
Authors: Sonam Gupta, Snehal Singh Tomar, Grigorios Chrysos, Sukhendu Das, Rajagopalan N Ambasamduram
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We split our experimental analysis of PNe RV into (1) evaluation of the representation ability using Video Reconstruction task (2) testing the efficacy on the proposed downstream tasks (3) performing appropriate ablation studies to assess the contributions and salience of individual design elements. The downstream tasks we perform include (i) Video Compression to assess the applicability of PNe RV as an alternate lightweight video representation (ii) Video Super-Resolution to assess the spatial continuity of PNe RV (iii) Video Interpolation to assess the temporal continuity of PNe RV (iv) Video Denoising as an interesting application of PNe RV. We also compare the rate of convergence (during training) of PNe RV vis-à-vis prior art. |
| Researcher Affiliation | Academia | Sonam Gupta EMAIL Department of Computer Science & Engineering, IIT Madras Snehal Singh Tomar EMAIL Department of Electrical Engineering, IIT Madras Grigorios G Chrysos EMAIL University of Wisconsin-Madison Sukhendu Das EMAIL Department of Computer Science & Engineering, IIT Madras A. N. Rajagopalan EMAIL Department of Electrical Engineering, IIT Madras |
| Pseudocode | Yes | A.3 The Hierarchical Patch-wise Spatial Sampling Algorithm Algorithm 1 Hierarchical Patch-wise Spatial Sampling Input: C, H, D, K, L, M, N, Pij, Pkl Output: λij, Λij |
| Open Source Code | No | The paper does not contain an unambiguous statement about releasing code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | We train and evaluate our model on the widely used UVG dataset (Mercat et al., 2020) and the "Big Buck Bunny" video sequence from scikit-video. The UVG dataset comprises 7 videos. |
| Dataset Splits | Yes | Following E-Ne RV s setup, we divide the training sequence in a 3:1 ("seen:unseen") ratio such that for every four consecutive frames, the fourth frame is not used training. This "unseen" frame is interpolated during inference to quantitatively evaluate the model s performance. |
| Hardware Specification | Yes | Table 12 (b) provides a quantitative comparison with state-of-the-art in terms of time taken (ms) to perform one forward pass of the model on NVIDIA Ge FORCE RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions the use of "Adam optimizer (Kingma & Ba, 2014)" and a "cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016)", but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | For all our experiments, we train each model for 300 epochs with a batch size 16 (unless specified otherwise) with up-scale factors set to 5, 2, 2. The input embeddings ΓF P E, ΓT SE, and ΓP P E are computed with ν = 1.25. We set l = 80 for ΓF P E and ΓT SE. Whereas, ΓT SE uses α = 40. The network is trained using Adam optimizer (Kingma & Ba, 2014) with default hyperparameters, a learning rate of 5e 4, and a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016). |