Frequency-Aware Deep Depth from Focus

Authors: Tao Yan, Yingying Wang, Jiangfeng Zhang, Yuhua Qian, Jieru Jia, Lu Chen, Feijiang Li

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments demonstrate that our model achieves compelling generalization and state-of-the-art depth prediction across various datasets. Additionally, it can be quickly adapted to real-world applications as a pretrained model. In this section, we detail the evaluation metrics and datasets used in our experiments and compare the proposed FAD method with state-of-the-art DFF methods. Furthermore, we conduct ablation experiments to assess the effectiveness of each component of the proposed network. Finally, we evaluate the model s generalization ability across various synthetic and real microscopic datasets.
Researcher Affiliation Academia Tao Yan , Yingying Wang , Jiangfeng Zhang , Yuhua Qian , Jieru Jia , Lu Chen , Feijiang Li Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China EMAIL, EMAIL, zjf EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations (e.g., equations 1-8), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code No The paper does not contain any explicit statement about releasing source code, a link to a code repository, or information indicating that code is provided in supplementary materials for the methodology described in this paper.
Open Datasets Yes The datasets consist of five synthetic datasets (SLFD [Shi et al., 2019], Pov-Ray [Heber and Pock, 2016], Defocus Net[Maximov et al., 2020], 4D Light Field [Honauer et al., 2017], Flying Things3D [Mayer et al., 2016]) and four real datasets (NYU Depth V2 [Carvalho et al., 2018], DDFF 12-Scene [Hazirbas et al., 2019], Middlebury [Scharstein et al., 2014], Microscopic), among which the microscopic dataset is unlabeled.
Dataset Splits No The paper mentions using various datasets for training and testing (e.g., training on Flying Things3D and testing on Middlebury, SLFD and Pov-Ray), and that 'Images are randomly cropped into 256 256 patches and fed into the network.' However, it does not provide specific details regarding the exact splits (e.g., percentages, sample counts, or explicit standard split names) used for training, validation, or testing subsets of these datasets.
Hardware Specification Yes We build our network model using the Py Torch framework and train and test it on a single NVIDIA Ge Force RTX 4090.
Software Dependencies No The paper states, 'We build our network model using the Py Torch framework' and mentions using the 'Adam optimizer'. However, it does not provide specific version numbers for PyTorch or any other software libraries or dependencies that would be required to replicate the experiments.
Experiment Setup Yes During training, we employ the Adam optimizer (β1 = 0.9, β2 = 0.99) with an initial learning rate of 10 3 and a batch size of 4. Additionally, we apply data augmentation techniques such as image flipping, cropping, rotation, and gamma correction. Images are randomly cropped into 256 256 patches and fed into the network. For our depth estimation method, we optimize the entire model by comparing predicted pixel depths to ground truth depths with a multi-scale weighted loss function. The specific loss function is defined as follows: i=1 ωi || Di Dgt ||2, where || ||2 denotes the L2 loss, and Dgt represents the ground truth depth map. i {1, 2, 3, 4} indicates the predicted depth maps at different pyramid-like scales. In this model, ωi is set to 0.3, 0.5, 0.7 and 1, respectively.