Predicting the Encoding Error of SIRENs

Authors: Jeremy Vonderfecht, Feng Liu

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Towards this goal, we present a method which predicts the encoding error that a popular INR network (SIREN) will reach, given its network hyperparameters and the signal to encode. This method is trained on a unique dataset of 300,000 SIRENs, trained across a variety of images and hyperparameters.1 Our predictive method demonstrates the feasibility of this regression problem, and allows users to anticipate the encoding error that a SIREN network will reach in milliseconds instead of minutes or longer. Our encoding error prediction networks outperform these simple baselines. On the single-architecture dataset, our model predicts the SIRENs PSNR to within 0.30 d B RMSE, with an R2 score of 0.996.
Researcher Affiliation Academia Jeremy Vonderfecht EMAIL Department of Computer Science Portland State University Feng Liu EMAIL Department of Computer Science Portland State University
Pseudocode No The paper describes methods and equations, but does not contain a dedicated pseudocode or algorithm block.
Open Source Code No The paper mentions 'Initial weights for each SIREN can be reconstructed using our source code,' but does not provide an explicit public release statement or a link to the source code for the methodology described in this paper.
Open Datasets Yes Dataset available here: huggingface.co/datasets/predict-SIREN-PSNR/COIN-collection. We have made this dataset publicly available. In this paper, we make use of two image datasets: Kodak (1991) and MSCOCO (Lin et al., 2014).
Dataset Splits Yes The training/validation/tests splits are done such that no image appears in two splits. We use an 80/10/10 train/validation/test split, and select the version of the network which obtained the best validation set accuracy during training.
Hardware Specification Yes For example, training a SIREN to compress a 512x768 pixel image to 0.3 bits per pixel with the method from Dupont et al. (2021) takes 50 minutes on a Titan X GPU. With a forward-pass time of 70 ms on a Titan X GPU, as opposed to an original encoding time of 2.5 minutes per network, this allows us to predict PSNR 2,000 faster than we could by training the SIREN.
Software Dependencies No The paper mentions 'Py Torch Image models library (TIMM) (Wightman, 2019)' and 'Scikit-Learn s Gaussian Process Regressor class.' While specific software tools are named, explicit version numbers for these components or other key software are not provided.
Experiment Setup Yes We find that across all of our networks, a learning rate of 0.001 is near-optimal. We also find that our SIREN networks show diminishing returns in PSNR after around 20,000 training steps. Therefore, we fix the number of training steps to 20,000 for all our networks. To train our single-architecture encoding error predictor, we use the ADAM optimizer with a learning rate of 0.0001 and a batch size of 8. We start by freezing the weights of the pretrained classifier network, and training just the MLP regression head for 10 epochs. Then we unfreeze the classifier and train the entire network for 10 epochs.