Chasing Better Deep Image Priors between Over- and Under-parameterization
Authors: Qiming Wu, Xiaohan Chen, Yifan Jiang, Zhangyang Wang
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results validate the superiority of LIPs: we can successfully locate the LIP subnetworks from overparameterized DIPs at substantial sparsity ranges. Those LIP subnetworks significantly outperform deep decoders under comparably compact model sizes (by often fully preserving the effectiveness of their over-parameterized counterparts), and they also possess high transferability across different images as well as restoration task types. Besides, we also extend LIP to compressive sensing image reconstruction, where a pre-trained GAN generator is used as the prior (in contrast to untrained DIP or deep decoder), and confirm its validity in this setting too. To our best knowledge, this is the first time that LTH is demonstrated to be relevant in the context of inverse problems or image priors. Codes are available at https://github.com/VITA-Group/Chasing-Better-DIPs. |
| Researcher Affiliation | Collaboration | Qiming Wu EMAIL University of California, Santa Barbara Xiaohan Chen EMAIL Decision Intelligence Lab, Damo Academy, Alibaba Group (U.S.) Yifan Jiang EMAIL University of Texas at Austin Zhangyang Wang EMAIL University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1 Single Image-based IMP Input: The desired sparsity s, the random code z, the untrained model fu. Output: A sparse DIP model f(z; θ m) with image prior property. Initialization: Set mu = 1 R||θ||0. Set iteration i = 0, training epochs N and j [0, N]. while the sparsity of mu < s do 1. Train the fu(z; θ0 mu) for N epochs; 2. Create the mask m u; 3. Update the mask mu = m u; 4. Set the model parameters: f(z; θj); 5. create the sparse model: f(z; θj mu); 6. i++; end while Algorithm 2 Weight-sharing IMP Input: The desired sparsity s, the random code z, the untrained model fu, x denotes the degraded image and images from n domains xa {x1, x2, ..., xn}. Output: A sparse DIP model f(z; θ m) with image prior property. Initialization: Set mu = 1 R||θ||0. Set iteration i = 0, training epochs N and j [0, N]. while the sparsity of mu < s do 1. loss = Pn a=1 E(f(z; θ m); xa); 2. Train the fu(z; θ0 mu) by Backpropagation (loss) for N epochs; 3. Update the mask mu = m u; 4. Set the model parameters f(z; θj); 5. create the sparse model f(z; θj mu); 6. i++; end while |
| Open Source Code | Yes | Codes are available at https://github.com/VITA-Group/Chasing-Better-DIPs. |
| Open Datasets | Yes | For evaluation datasets, we use the popular Set5 Bevilacqua et al. (2012) and Set14 Zeyde et al. (2010). We also evaluate the transferability of subnetworks on image classification datasets such as Image Net-20 Deng et al. (2009) and CIFAR10 Krizhevsky et al. (2009). We use PGGAN Karras et al. (2017) pre-trained on Celeb A-HQ dataset Lee et al. (2020) as the model in this section. |
| Dataset Splits | Yes | For evaluation datasets, we use the popular Set5 Bevilacqua et al. (2012) and Set14 Zeyde et al. (2010). We evaluate the multi-image IMP in two different settings: (i) cross-domain setting where we apply the multi-image IMP to the five images from Set5 Bevilacqua et al. (2012); (2) single-domain setting where we apply the multi-image IMP to five images of human faces with glasses. We think images from Set 5 are more diversified because they include bird, butterfly and human face contents. We compare single-image IMP winning tickets found on the F16 and the Woman images from Set5 with the cross-domain ticket, and the single-image IMP winning tickets found on Face-4 and Face-2 images with the single-domain ticket. In each IMP iteration, PGGAN is first fine-tuned on 40% of images in Celeb A-HQ for 30 epochs, has 20% of its remaining weights pruned, and then reset to the pre-trained weights. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It describes the experimental methods and results but omits hardware specifications. |
| Software Dependencies | No | The paper does not explicitly provide specific version numbers for software dependencies used in their own methodology. While it links to the code of a third-party SGLD-DIP model, it does not list the dependencies with versions for the work presented in this paper. |
| Experiment Setup | Yes | The parameter count of the original DIP model is 2.2 million (M); and that of the deep decoder is 0.1 M for denoising and super-resolution experiments, and 0.6 M for inpainting experiments, all following the original settings of Heckel & Hand (2018). The model sizes are plotted as horizontal coordinates in the figures. We run all experiments with 10 different random seeds: every solid curve is plotted over the 10-time average, and the accompanying shadow regions indicate the 10-time variance. In each IMP iteration, models are trained towards the standard DIP objective to fit the degraded observations for a certain number of training steps following the original DIP Ulyanov et al. (2018). We prune 20% of the remaining weights in each IMP iteration, resulting in sparsity ratios si = 1 0.8i. We fix the number of measurements to 1,000 with 20 corrupted measurements, and minimize the MOM objective (Median-of Means, an algorithm proposed by Jalal et al. (2020)) for 1,500 iterations to recover the images. We compare the performance (measured in per-pixel reconstruction error) of LIP with the dense baselines in the first row of Table. 1 and provide a visual example in Fig. 8(a). We trained these subnetworks on the denoising task on the Baby image with 3000 iterations. |