DVI:A Derivative-based Vision Network for INR

Authors: Runzhao Yang, Xiaolong Wu, Zhihong Zhang, Fabian Zhang, Tingxiong Xiao, Zongren Li, Kunlun He, Jinli Suo

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five vision tasks across three data modalities demonstrate DVI s superiority over existing methods. Additionally, our study encompasses comprehensive ablation studies to affirm the efficacy of each element of DVI, the influence of different derivative computation techniques and the impact of derivative orders. Reproducible codes are provided in the supplementary materials. Quantitative results are shown in Tables 1 to 4. Visual results are shown in Figures 3 and S1 to S5.
Researcher Affiliation Collaboration 1Department of Automation, Tsinghua University, Beijing, China 2Institute of Advanced Technology, University of Science and Technology of China, Hefei, China 3Xiaomi Corporation, Shanghai, China 4Department of Computer Science, ETH, Zurich, Switzerland 5The People s Liberation Army General Hospital, Beijing, China 6Institute of Brain and Cognitive Sciences, Tsinghua University, Beijing, China 7Shanghai Artificial Intelligence Laboratory, Shanghai, China.
Pseudocode No The paper describes the methodology and pipeline using textual descriptions and architectural diagrams (Figures 1 and 2), but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code Yes Reproducible codes are provided in the supplementary materials.
Open Datasets Yes For the image super-resolution task, we adopted the setup from works (Lim et al., 2017; Liang et al., 2021; Li et al., 2023), utilizing DIV2K (Agustsson & Timofte, 2017) as the training set, with Set5 (Bevilacqua et al., 2012), Set14 (Zeyde et al., 2012), BSD100 (Martin et al., 2001b), Urban100 (Huang et al., 2015), and Manga109 (Matsui et al., 2017) serving as test sets. Similarly, for the image denoising task, the setup from works (Zhang et al., 2021b; Liang et al., 2021; Li et al., 2023) was followed, employing BSD500 (Martin et al., 2001a) and WED (Ma et al., 2016) as training sets, along with CBSD68 (Martin et al., 2001a), Kodak24 (Franzen, 1999), and Mc Master (Zhang et al., 2011) as test sets. In the domain of 3D volume segmentation, the setup from works (Milletari et al., 2016; C ic ek et al., 2016), using Synapse (Landman et al., 2015), was followed. For video tasks, Go Pro (Nah et al., 2017) was used as the benchmark for video deblurring, following the setup in works (Cao et al., 2023; Son et al., 2021), and Sintel (Butler et al., 2012) for video optical flow estimation, based on the methods described in (Huang et al., 2022; Zhang et al., 2021a).
Dataset Splits Yes For the image super-resolution task, we adopted the setup from works (Lim et al., 2017; Liang et al., 2021; Li et al., 2023), utilizing DIV2K (Agustsson & Timofte, 2017) as the training set, with Set5 (Bevilacqua et al., 2012), Set14 (Zeyde et al., 2012), BSD100 (Martin et al., 2001b), Urban100 (Huang et al., 2015), and Manga109 (Matsui et al., 2017) serving as test sets. For the image denoising task... employing BSD500 (Martin et al., 2001a) and WED (Ma et al., 2016) as training sets, along with CBSD68 (Martin et al., 2001a), Kodak24 (Franzen, 1999), and Mc Master (Zhang et al., 2011) as test sets. We used only the first 1000 data in the WED dataset sorted by name. For the 3D volume segmentation task... We used the first 18 of the volumes for training and the last 12 for testing. For video tasks, Go Pro (Nah et al., 2017) was used... We used only the first 40 frames of each video. Sintel (Butler et al., 2012) for video optical flow estimation... We divided the Sintel Training data into the training and testing sets required for this experiment in a ratio of 14:9.
Hardware Specification Yes These two experiments were conducted on one GPU RTX3090.
Software Dependencies No The paper mentions using SIREN, Adamax, Adam optimizers, and libraries like cv2 (OpenCV) and scipy.ndimage.zoom, but it does not specify explicit version numbers for these software components or programming languages, which is required for reproducibility.
Experiment Setup Yes A.1.1 (Image Super-resolution Task - Data Preparation): "We convert each downsampled image to INR using SIREN (Sitzmann et al., 2020) with Adamax (Kingma & Ba, 2014) as optimizer with a learning rate of 1e-3 and 20,000 iterations. Specifically, to keep the INR representation accuracy of each INR consistent, we set the total number of parameters in the INR based on a percentage of the number of parameters in each image, and the percentage was set to 50%." A.1.2 (Image Super-resolution Task - Training): "For INSP (Xu et al., 2022)... expand the number of layers to 10, and the number of neurons per layer to 1024... We trained 100 epochs with Adam (Kingma & Ba, 2014) optimizer at 0.001 learning rate after random initialization. For our approach DVI, we set P in the INR High Order Derivatives Computation module to 3. When using EDSR as the pre-existing network, we set the K to 2 and select the outputs of conv first layer and body layer in EDSR as intermediate features for fusion. When using Swin IR as the pre-existing network, we set the K to 2 and select the outputs of conv first layer and conv after body layer in Swin IR as intermediate features for fusion. We trained DVI with the pre-existing configuration (optimizer, learning rate, etc.)." Similar detailed configurations are provided for other tasks in sections A.2, A.3, A.4, A.5.