EvSTVSR: Event Guided Space-Time Video Super-Resolution
Authors: Haojie Yan, Zhan Lu, Zehao Chen, De Ma, Huajin Tang, Qian Zheng, Gang Pan
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our method not only outperforms existing RGB-based approaches but also excels in handling large motion scenarios. |
| Researcher Affiliation | Academia | 1The State Key Lab of Brain-Machine Intelligence, Zhejiang University, China 2College of Computer Science and Technology, Zhejiang University, China 3 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore |
| Pseudocode | No | The paper describes methods with formulas and block diagrams but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code https://github.com/hjyyyd/Ev STVSR. |
| Open Datasets | Yes | Similar to previous methods that addressed the STVSR task, we followed the training and testing protocols of Video INR (Chen et al. 2022) to validate our approach on the Adobe240 (Su et al. 2017) and Go Pro (Nah, Hyun Kim, and Mu Lee 2017) datasets. Both datasets have a resolution of 1280 720 and a frame rate of 240 fps. We generated events between consecutive frames using vid2e (Gehrig et al. 2020) to simulate realistic event noise, showcasing our method s robustness to noise. The Adobe240 dataset includes 100 training, 16 validation, and 17 testing videos, while the Go Pro dataset contains 22 training and 11 testing videos. We trained our model on Adobe and tested it on both Adobe and Go Pro, following Video INR s approach. We used a sliding window of 9 frames, with the 1st and 9th frames, along with intermediate events, as inputs, down-sampled by a factor of 4. The high-resolution frames served as the ground truth. VFI and VSR Datasets. Since our method can independently perform both super-resolution and interpolation tasks, we conducted experiments on two real event datasets to validate their performance thoroughly. Specifically, we performed Video Frame Interpolation (VFI) experiments on the BS-ERGB dataset (Tulyakov et al. 2022). BS-ERGB is widely used for event-guided VFI tasks and is characterized by complex motions, including non-linear and large movements. We trained and tested our method on this dataset and compared the results with previous methods. Additionally, we performed video super-resolution(VSR) experiments on the CED dataset (Scheerlinck et al. 2019), and compared our results with those of prior approaches. |
| Dataset Splits | Yes | The Adobe240 dataset includes 100 training, 16 validation, and 17 testing videos, while the Go Pro dataset contains 22 training and 11 testing videos. We trained our model on Adobe and tested it on both Adobe and Go Pro, following Video INR s approach. |
| Hardware Specification | Yes | The experiments were executed on four NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | For all experiments, the Adam optimizer (Kingma 2014) was employed with hyperparameters β1 = 0.9 and β2 = 0.999. The paper mentions the Adam optimizer and RAFT optical flow model, but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For all experiments, the Adam optimizer (Kingma 2014) was employed with hyperparameters β1 = 0.9 and β2 = 0.999. The initial learning rate was set at 4 10 4 and was systematically reduced to 1 10 7 through cosine annealing every 150k iterations. The training was conducted over 600k iterations with a batch size of 8. Data augmentation strategies, including random rotations and random cropping, were applied. |