Enhancing Implicit Neural Representations via Symmetric Power Transformation

Authors: Weixiang Zhang, Shuzhao Xie, Chengwei Ren, Shijia Ge, Mingzi Wang, Zhi Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted to verify the performance of the proposed method, demonstrating that our transformation can reliably improve INR compared with other data transformations. We also conduct 1D audio, 2D image and 3D video fitting tasks to demonstrate the effectiveness and applicability of our method.
Researcher Affiliation Academia Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China EMAIL, EMAIL
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/zwx-open/Symmetric-Power Transformation-INR
Open Datasets Yes We chose SIREN as the backbone, fitting the widely used processed DIV2K datasets (Agustsson and Timofte 2017; Tancik et al. 2020) and Kodak dataset (E.Kodak 1999). We use Libri Speech (Panayotov et al. 2015) datasets to evaluate the effectiveness of our method. evaluated on the Shake NDry video from the UVG dataset (Mercat, Viitanen, and Vanne 2020)
Dataset Splits Yes Specifically, we selected the test.clean split of the dataset and cropped each audio to the first 5 seconds at a sampling rate of 16k Hz. evaluated on the Shake NDry video from the UVG dataset (Mercat, Viitanen, and Vanne 2020) (the first 30 frames with 1920 × 1080 resolution).
Hardware Specification Yes All experiments were conducted on 4 GPUs equipped with NVIDIA RTX 3090.
Software Dependencies No The paper mentions 'l2 loss functions and the Adam optimizer' and 'SIREN' and 'FINER' as backbones, but does not specify version numbers for general software libraries or programming languages required for reproduction.
Experiment Setup Yes We implemented all experiments using l2 loss functions and the Adam optimizer (Kingma and Ba 2015). We set the total number of iterations to 5000, with hyper-parameters ξ = 0.5, τ = 0.1, and κ = 256 in our method. Following the setting of Siamese SIREN (Lanzend orfer and Wattenhofer 2023), we set ω and ω0 both to 100 in the SIREN backbone. We conducted the video fitting with SIREN and FINER backbones, both using the same network size of 6 × 256. Each scenario was trained for 100 epochs.