Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion

Authors: Minkyoung Cho, Yulong Cao, Jiachen Sun, Qingzhao Zhang, Marco Pavone, Jeong Joon Park, Heng Yang, Zhuoqing Mao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our evaluation on the nu Scenes dataset (Caesar et al., 2019), Cocoon demonstrates notable improvement in both accuracy and robustness, consistently outperforming static and other adaptive fusion methods across normal and challenging scenarios, including natural and artificial corruptions. Furthermore, we demonstrate the validity and efficacy of our uncertainty metric across diverse datasets.
Researcher Affiliation Collaboration 1University of Michigan 2NVIDIA Research 3Stanford University 4Harvard University
Pseudocode No The paper describes the Cocoon framework and its components (base object detector, feature alignment, uncertainty quantification, and adaptive fusion) in detail, including mathematical formulations for the nonconformity function and training objectives. However, it does not present these steps in a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper provides links to the base models FUTR3D and Trans Fusion in Appendix N. However, it does not provide an explicit link or statement indicating the immediate public availability of the code specifically for the Cocoon methodology described in this paper. The project website linked in the header mentions that their code will be publicly available (implying future release) and does not host it yet.
Open Datasets Yes Our evaluations utilize the nu Scenes dataset (Caesar et al., 2019), a comprehensive autonomous driving dataset collected from vehicles equipped with a 32-beam Li DAR and 6 RGB cameras. ... Following previous studies, we also use 10 uni-dimensional real-world datasets for validation. Several datset s (bio, bike, community, facebook1, and facebook2) are sourced from the UCI Machine Learning Repository (Kelly et al., 2023; Singh, 2016). Additionally, we utilize data from the Blog Feedback dataset (Buza, 2014), the STAR dataset (Achilles et al., 2008), and the Medical Expenditure Panel Survey datasets (meps19, meps20, and meps21) (Cohen et al., 2009).
Dataset Splits Yes For conformal prediction, we partition the original training dataset into a proper training set and a calibration set with a 6:1 ratio.
Hardware Specification Yes For computational resources, 4 A40 GPUs were used for training, and 1 RTX 2080 for calibration and inference.
Software Dependencies No The paper mentions base models like FUTR3D and Trans Fusion and general concepts like ResNet and VoxelNet, but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup Yes In Eq. 3, we set δ = 5 num_queries, ζ = 3 num_queries, and η = 1 7 num_queries. See Appendix D for these coefficient values and further details.