EchoDiffusion: Waveform Conditioned Diffusion Models for Echo-Based Depth Estimation
Authors: Wenjie Zhang, Jun Yin, Long Ma, Peng Yu, Xiaoheng Jiang, Zhen Tian, Mingliang Xu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluations on the Replica and Matterport3D datasets demonstrate that Echo Diffusion establishes new benchmarks for state-of-the-art performance in echo-based depth estimation. Experiments Datasets We conduct experiments using the Replica (Straub et al. 2019) and Matterport3D (Chang et al. 2017) datasets. To accurately assess the predictive capabilities of the Echo Diffusion model, we evaluated its performance using both Replica and Matterport3D datasets and compared it with leading echo depth prediction models Bat-Net (Brunetto et al. 2023) and Echo-Net (Parida et al. 2021). Ablation Study In our ablation study, as shown in Table 2, we conducted a comprehensive analysis to emphasize the critical role and efficacy of the MALF network. |
| Researcher Affiliation | Academia | Wenjie Zhang1,2,3, Jun Yin1, Long Ma1, Peng Yu1, Xiaoheng Jiang1,2,3, Zhen Tian1,2,3*, Mingliang Xu1,2,3* 1School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, China 2Engineering Research Center of Intelligent Swarm Systems, Ministry of Education, Zhengzhou, 450001, China 3 National Supercomputing Center in Zhengzhou, Zhengzhou, 450001, China EMAIL, EMAIL |
| Pseudocode | No | The paper provides detailed descriptions of the model architecture and methodology, including figures and textual explanations. However, it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code or an algorithm. |
| Open Source Code | Yes | Code https://github.com/wjzhang-ai/Echo Diffusion |
| Open Datasets | Yes | We conduct experiments using the Replica (Straub et al. 2019) and Matterport3D (Chang et al. 2017) datasets. These datasets are widely utilized in echo depth estimation research due to their rich variety of scenes and diverse data. Studies such as (Parida et al. 2021; Irie, Shibata et al. 2022; Brunetto et al. 2023) have employed these datasets to evaluate model performance. |
| Dataset Splits | Yes | The Replica dataset comprises 18 scenes, including hotels, offices, and various room types. For training, 15 scenes with 5,496 instances are utilized, while 3 scenes containing 1,464 instances are reserved for testing. The Matterport3D dataset contains more scenes, consisting of a total of 90 scenes, including apartments, meeting rooms, shops, and more. Of these, 59 scenes (comprising 40,176 instances) are used for training, 10 scenes (comprising 13,592 instances) for validation, and 8 scenes (comprising 13,602 instances) for testing. |
| Hardware Specification | Yes | The Echo Diffusion model was implemented using Py Torch and trained on an NVIDIA Ge Force RTX 4090 D. Training times per epoch varied depending on the dataset: approximately 80 seconds for the Replica dataset and around 400 seconds for the Matterport3D dataset. |
| Software Dependencies | No | The Echo Diffusion model was implemented using Py Torch and trained on an NVIDIA Ge Force RTX 4090 D. PyTorch is mentioned, but no specific version number is provided to ensure reproducibility. |
| Experiment Setup | Yes | The training process spanned 150 epochs for each dataset. A learning rate of 0.0001 was employed, coupled with an L2 regularization coefficient of 0.0005 (Paszke et al. 2019). The Adam W optimizer (Paszke et al. 2019) was utilized for optimization, with a batch size set to 32. |