RESfM: Robust Deep Equivariant Structure from Motion

Authors: Fadi Khatib, Yoni Kasten, Dror Moran, Meirav Galun, Ronen Basri

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our method can be applied successfully in realistic settings that include large image collections and point tracks extracted with common heuristics that include many outliers, achieving stateof-the-art accuracies in almost all runs, superior to existing deep-based methods and on-par with leading classical (non-deep) sequential and global methods. ... We compare our method to this equivariant Sf M method where, for fairness, we replace BA with our robust BA. ... Our results for the Mega Depth and 1DSFM test scenes are shown in Tables 1 and 2, respectively. Each table lists the number of input images (Nc), the fraction of outlier track points, and our results compared to the baseline methods.
Researcher Affiliation Collaboration Fadi Khatib1 Yoni Kasten2 Dror Moran1 Meirav Galun1 Ronen Basri1 1Weizmann Institute of Science 2NVIDIA
Pseudocode No The paper describes the network architecture and various steps involved in the methodology in detailed paragraph text and a block diagram (Figure 2). However, it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like formatting.
Open Source Code Yes The project page: https://robust-equivariant-sfm.github.io/ ... Our code and preprocessed point tracks data will be made publicly available.
Open Datasets Yes Our network is trained on scenes from the Mega Depth dataset (Li & Snavely, 2018). It is then tested on both novel scenes from the Mega Depth dataset and in cross-dataset generalization tests on the 1DSf M dataset (Wilson & Snavely, 2014), Strecha (Strecha et al., 2008), and Blended MVS (Yao et al., 2020).
Dataset Splits Yes During training, we processed all training scenes sequentially for each epoch. For each scene, we randomly selected a subset of 10%-20% of the images. We employed a validation set for early stopping, selecting the checkpoint with minimal error. ... From the first group, we randomly sampled 27 scenes to serve as our training dataset, along with four scenes designated for validation purposes. For the test set, we randomly selected 14 scenes from the first group. Moreover, from each scene in the second group, we randomly sampled 300 images to represent a condensed version of the scene.
Hardware Specification Yes Our method was trained and evaluated on NVIDIA A40 GPUs (48GB of GPU Memory).
Software Dependencies No We used Py Torch (Paszke et al., 2019) as the deep learning framework and the ADAM optimizer (Kingma & Ba, 2014) with normalized gradients. ... For the bundle adjustment, we employed the Ceres Solver s implementation of bundle adjustment Agarwal et al. with the Huber loss to enhance robustness (with parameter 0.1). The paper mentions software tools like PyTorch, ADAM optimizer, and Ceres Solver, but does not provide specific version numbers for these components. The year in the citation (e.g., PyTorch (Paszke et al., 2019)) is not a specific version number.
Experiment Setup Yes Our loss function combines two terms L = Loutliers + αLreprojection where the hyperparameter α = 10 balances the two terms and was determined by a hyperparameter search. ... we fine-tune the network for the tested scene by minimizing the unsupervised reprojection loss equation 3 for 1000 epochs. ... We tried different implementation hyper-parameters including (1) learning rates {1e 2, 1e 3, 1e 4}, (2) network width {128, 256, 512} for the encoder E and the heads, (3) number of layers {2, 3, 4, 5} in these networks, and (4) threshold for outlier removal {0.4, 0.5, 0.6, 0.7, 0.8}.