Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Learning 3D Perception from Others' Predictions

Authors: Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Weinberger, Bharath Hariharan, Wei-Lun Chao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units predictions. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars predictions as pseudo labels for the ego car.
Researcher Affiliation Academia Jinsu Yoo1 Zhenyang Feng1 Tai-Yu Pan1 Yihong Sun2 Cheng Perng Phoo2 Xiangyu Chen2 Mark Campbell2 Kilian Q Weinberger2 Bharath Hariharan2 Wei-Lun Chao1 1The Ohio State University 2Cornell University
Pseudocode No The paper describes the overall pipeline in Section 3.5 using numbered steps but does not present it in a formally structured pseudocode or algorithm block.
Open Source Code No We plan to make our implementation publicly available to promote reproducibility.
Open Datasets Yes We conduct experiments primarily on the V2V4Real dataset (Xu et al., 2023), which consists of 40 clips with a total of 18k frames by driving two cars, Tesla and Honda, together within 100m. (Please see additional results on the OPV2V dataset (Xu et al., 2022c) in the supplementary.)
Dataset Splits Yes To align with our research purpose, we re-split the original data into three portions: "R pretraining", "R prediction/E training", and "E validation/test" (Fig. 8). Specifically, we split them into two subsets containing 20 clips and use the first subset to pre-train R s detector f R. Then we inference on the second subset to provide pseudo labels YR for training E s detector f E together with E s point clouds. We validate and test the E s performance on the first subset by splitting it into 20% and 80%. Our re-split gives 4,488 frames for R pretraining, 4,463 2 frames for E training, and 870 and 3,618 frames for E validation/test respectively.
Hardware Specification Yes We train it with 60 epochs and a batch size of 64 on 8 NVIDIA Tesla P100 GPUs.
Software Dependencies No We conduct experiments with Point Pillars (Lang et al., 2019) as a default detector. We adopt a neural network similar to Point Net (Qi et al., 2017) for the ranker for its simplicity.
Experiment Setup Yes We train it with 60 epochs and a batch size of 64 on 8 NVIDIA Tesla P100 GPUs. We use Adam optimizer and an initial learning rate of 2e-3 dropped to 2e-5 by cosine annealing decaying strategy (Loshchilov & Hutter, 2017). For curriculum learning, we set TE-R to 40m. Also, we set the ranker threshold to 0.5, and λ for the distance-based threshold to 1 with a fixed confidence threshold Tc of 0.2.