FlowBench: Benchmarking Optical Flow Estimation Methods for Reliability and Generalization
Authors: Shashank Agnihotri, Julian Yuya Caspary, Luca Schwarz, Xinyan Gao, Jenny Schmalfuss, Andres Bruhn, Margret Keuper
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Flow Bench facilitates streamlined research into the reliability of optical flow methods by benchmarking their robustness to adversarial attacks and out-of-distribution samples. With Flow Bench, we benchmark 57 checkpoints across 3 datasets under 9 diverse adversarial attacks and 23 established common corruptions, making it the most comprehensive robustness analysis of optical flow methods to date. |
| Researcher Affiliation | Academia | Shashank Agnihotri EMAIL Data and Web Science Group, University of Mannheim, Germany Julian Yuya Caspary EMAIL Data and Web Science Group, University of Mannheim, Germany Luca Schwarz EMAIL Data and Web Science Group, University of Mannheim, Germany Xinyan Gao EMAIL Data and Web Science Group, University of Mannheim, Germany Jenny Schmalfuss EMAIL Computer Vision Group, University of Stuttgart, Germany Andrés Bruhn EMAIL Computer Vision Group, University of Stuttgart, Germany Margret Keuper EMAIL Data and Web Science Group, University of Mannheim, Germany Max-Planck-Institute for Informatics, Saarland Informatics Campus, Germany |
| Pseudocode | No | The paper describes algorithms and methods using mathematical equations and textual explanations, but it does not contain a dedicated section or figure explicitly labeled as "Pseudocode" or "Algorithm" with structured, code-like steps. |
| Open Source Code | Yes | The open-source code and weights for Flow Bench are available in this Git Hub repository. ... Flow Bench is completely open-source, allowing the community to generate pull requests to add new methods, attacks, checkpoints, benchmarking results, and metrics, and thus pursue these directions of work as well. ... The proposed Flow Bench benchmarking tool is available as a library in the following codebase: https://github.com/shashankskagnihotri/Flow Bench. |
| Open Datasets | Yes | Flow Bench supports 37 unique architectures, for example, RAFT, Flow Former, Flow Former++, CCMR, anmd others (new architectures added to ptlflow over time are compatible with Flow Bench) and distinct datasets, namely Flying Things3D (Mayer et al., 2016), KITTI2015 (Menze & Geiger, 2015), MPI Sintel (Butler et al., 2012) (clean and final) and Spring (Mehl et al., 2023) datasets. |
| Dataset Splits | Yes | KITTI2015: Proposed by Menze & Geiger (2015), this dataset is focused on the real-world driving scenario. It contains a total of 400 pairs of image frames, split equally for training and testing. ... MPI Sintel: Proposed by Butler et al. (2012) and Wulff et al. (2012), this dataset ... consists of a total of 1064 synthetic frames for training and 564 synthetic frames for testing |
| Hardware Specification | Yes | Most experiments were done on a single 40 GB NVIDIA Tesla V100 GPU each, however, MS-RAFT+, Flow Former, and Flow Former++ are more compute-intensive, and thus 80GB NVIDIA A100 GPUs or NVIDIA H100 were used for these models, a single GPU for each experiment. |
| Software Dependencies | No | The paper mentions that Flow Bench is built using ptlflow (Morimitsu, 2021) and refers to pytorch (Paszke et al., 2019) for calculation approximations. However, specific version numbers for these software libraries used in the experiments are not provided (e.g., "PyTorch 1.9" or "ptlflow 1.2"). The years refer to publications, not explicit software versions. |
| Experiment Setup | Yes | For calculating TARE and NARE values we used BIM, PGD, and Cos PGD attack with step size α=0.01, perturbation budget ϵ = 8/255 under the ℓ∞-norm bound... We use 20 attack iterations for calculating TARE and NARE... For finetuning, each mini-batch is adversarially attacked and used to finetune the model. We report results from a subset of methods adversarially finetuned (10k iterations with a starting learning rate=10^-6 and same learning rate scheduler as used by the method during training) on the KITTI2015 training dataset using ℓ∞-norm constrained 3 iterations PGD attack with ϵ [4/255, 8/255] and α = 0.01... For PCFA: perturbation budget ϵ = 0.05 and step size α = 1e-7... Adversarial Weather: Snow (random snowflakes) Number of Particles: 3000 Number of optimization steps: 750 |