Fast Training of Sinusoidal Neural Fields via Scaling Initialization

Authors: Taesun Yeom, Sangyoon Lee, Jaeho Lee

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we focus on a popular family of neural fields, called sinusoidal neural fields (SNFs), and study how it should be initialized to maximize the training speed. We find that the standard initialization scheme for SNFs designed based on the signal propagation principle is suboptimal. In particular, we show that by simply multiplying each weight (except for the last layer) by a constant, we can accelerate SNF training by 10 . This method, coined weight scaling, consistently provides a significant speedup over various data domains, allowing the SNFs to train faster than more recently proposed architectures. To understand why the weight scaling works well, we conduct extensive theoretical and empirical analyses which reveal that the weight scaling not only resolves the spectral bias quite effectively but also enjoys a well-conditioned optimization trajectory. 5 EXPERIMENTS In this section, we first address whether the weight scaling is effective in other data domains (Section 5.1); our answer is positive. Then, we discuss the factors that determine the optimal value of scaling factor α for the given target task (Section 5.2); we find that the optimal value does not depend much on the nature of each datum, but rather relies on the structural properties of the workload. We validate the effectiveness of WS in different data domains by comparing against various neural fields. To compare the training speed, we compare the training accuracy for equivalent steps. In particular, we consider the following tasks and baselines. Other details can be found in Appendix F.4. Task: Image regression. The network is trained to approximate the signal intensity c, for each given normalized 2D pixel coordinates (x, y). For our experiments, we use Kodak (Kodak, 1999) and DIV2K (Agustsson & Timofte, 2017) datasets. Each image is resized to a resolution of 512 512 in grayscale, following Lindell et al. (2022); Seo et al. (2024). We report the training PSNR after a full-batch training for 150 iterations. For all NFs, we use five layers with width 512. Table 1. Weight scaling in various data domains. We compare the training speed of the weight-scaled SNF against other baselines in various data domains. To evaluate the training speed, we train for a fixed number of steps and compare the training loss achieved. Bold denotes the best option, and underlined denotes the runner-up. We have experimented with five random seeds, and report the mean and the standard deviation.
Researcher Affiliation Academia Taesun Yeom , Sangyoon Lee & Jaeho Lee Pohang University of Science and Technology (POSTECH) EMAIL
Pseudocode No The paper describes methods and equations but does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to code repositories.
Open Datasets Yes Task: Image regression. The network is trained to approximate the signal intensity c, for each given normalized 2D pixel coordinates (x, y). For our experiments, we use Kodak (Kodak, 1999) and DIV2K (Agustsson & Timofte, 2017) datasets. Each image is resized to a resolution of 512 512 in grayscale, following Lindell et al. (2022); Seo et al. (2024). Task: Occupancy field. The network is trained to approximate the occupancy field of a 3D shape, i.e., predict 1 for occupied coordinates and 0 for empty space. We use the voxel grid of size 512 512 512 following Saragadam et al. (2023). For evaluation, we measure the training intersectionover-union (Io U) after 50 iterations on the Lucy data from the Standard 3D Scanning Repository, with the batch size 100k. Task: Spherical data. We use 10 randomly selected samples from the ERA5 dataset, which contains temperature values corresponding to a grid of latitude ϕ and longitude θ, using the geographic coordinate system (GCS). Task: Audio data. Audio is a 1D temporal signal, and the network is trained to approximate the amplitude of the audio at a given timestamp. We use the first 7 seconds of Bach s Cello Suite No. 1, following Sitzmann et al. (2020b). Appendix F.5 NEURAL DATASET EXPERIMENTS: For training, we used fit-a-nef (Papa et al., 2024), a JAX-based library for fast construction of large-scale neural field datasets. We used the entire dataset for NF dataset generation (i.e., 60,000 images for MNIST and CIFAR-10, respectively). Appendix F.6 NEURAL RADIANCE FIELDS EXPERIMENTS: Datasets. We use the Lego and Drums data from the Ne RF-synthetic dataset (Mildenhall et al., 2020), which is publicly available online.
Dataset Splits Yes Appendix F.6 NEURAL RADIANCE FIELDS EXPERIMENTS: Datasets. We use the Lego and Drums data from the Ne RF-synthetic dataset (Mildenhall et al., 2020), which is publicly available online. Each dataset contains 100 training images, 100 validation images, and 200 test images, along with their corresponding camera directions.
Hardware Specification Yes Appendix F.4 TRAINING SETTINGS AND BASELINES: We used NVIDIA RTX 3090/4090/A5000/A6000 GPUs for all experiments.
Software Dependencies No Appendix F.5 NEURAL DATASET EXPERIMENTS: For training, we used fit-a-nef (Papa et al., 2024), a JAX-based library for fast construction of large-scale neural field datasets. Appendix F.6 NEURAL RADIANCE FIELDS EXPERIMENTS: Implementation details. We mainly follow the settings of WIRE (Saragadam et al., 2023), using the torch-ngp codebase. The paper mentions software frameworks/libraries (JAX, PyTorch implied by torch-ngp) and specific codebases (fit-a-nef, torch-ngp) but does not provide specific version numbers for these components.
Experiment Setup Yes Section 5.1 MAIN EXPERIMENTS: To compare the training speed, we compare the training accuracy for equivalent steps... For all NFs, we use five layers with width 512. Task: Image regression... We report the training PSNR after a full-batch training for 150 iterations. Task: Occupancy field... Io U after 50 iterations on the Lucy data... with the batch size 100k. Task: Spherical data... We report the training PSNR after 5k iterations of full-batch training. Task: Audio data... We report the training PSNR after 1k iterations of full-batch training. Appendix F.4 TRAINING SETTINGS AND BASELINES: In this section, we provide detailed information about our experiments. For data fitting tasks, we used the Adam optimizer (Kingma, 2015) with a learning rate of 1e-04 (except spherical data, in this case we use a learning rate of 1e-05), without any learning rate scheduler, except for the occupancy field experiments (in which case, we use PyTorch learning rate scheduler). Each experiment was conducted with 5 different seeds, and we report both the average and standard deviation of the evaluation metric. Table 2. Detailed information about hyperparameters. We provide the exact hyperparameter settings for each domain in the table below. (This table specifies parameters like k, sigma, m, omega, s, alpha for different tasks and baselines).