NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Authors: Amandine Brunetto, Sascha Hornauer, Fabien Moutarde
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Ne RAF generates high-quality audio on Sound Spaces and RAF datasets, achieving significant performance improvements over prior methods while being more data-efficient. Additionally, Ne RAF enhances novel view synthesis of complex scenes trained with sparse data through cross-modal learning. Ne RAF is designed as a Nerfstudio module, providing convenient access to realistic audio-visual generation. Project page: https://amandinebtto.github.io/Ne RAF |
| Researcher Affiliation | Academia | Amandine Brunetto, Sascha Hornauer, Fabien Moutarde Center for Robotics, Mines Paris PSL University Paris, France EMAIL |
| Pseudocode | No | The paper describes the architecture and methodology in detail using text and figures (Figure 2, Figure 4) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release Ne RAF s code to the community. Project page: https://amandinebtto.github.io/Ne RAF |
| Open Datasets | Yes | We validate our method on Sound Spaces (Chen et al., 2020a; 2022b), a simulated dataset, and on RAF (Chen et al., 2024), a real-world dataset. |
| Dataset Splits | Yes | Similar to (Su et al., 2022; Luo et al., 2022; Liang et al., 2023a), we use 90% of Sound Spaces audio data for training and 10% for testing. For RAF, we follow previous works experimental setup: we keep 80% of data for training and 20% for evaluation. Nerfstudio automatically keep 90% of them for training and 10% for evaluation. |
| Hardware Specification | Yes | We train our method on a single RTX 4090 GPU. |
| Software Dependencies | No | We implement our method using Py Torch framework (Paszke et al., 2019). We optimize NAc F using Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.999 and ϵ = 10 15. For Ne RF, just as AV-Ne RF we keep default Nerfacto parameters. The paper mentions PyTorch and Nerfstudio but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We optimize NAc F using Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.999 and ϵ = 10 15. The initial learning rate is 10 4. It decreases exponentially to reach 10 8. For Ne RF, just as AV-Ne RF we keep default Nerfacto parameters. For the first 2k iterations, we only train the Ne RF part. It allows the grid to be filled and updated several times using batches of 4,096 voxel-centers. After, both Ne RF and NAc F are train jointly. We use batch sizes of 4,096 for Ne RF and 2,048 for NAc F. Ne RAF is trained for 500k iterations but most runs reach their peak performance before, depending of the room size. We empirically select λA = 10 3, λSC = 10 1 and λSL = 1. |